OpenMed: Six Months of Open-Source Medical AI and the Road Ahead

Community Article Published January 6, 2026

January 6, 2026

new_openmed_poster

Why I Built This

After seven years leading Spark NLP, watching it grow from three-digit download counts to 150 million, scaling from dozens of models to 130,000+ pretrained pipelines across 200+ languages, I handed the baton to the next generation and walked away with a question: What would I build if I had complete freedom?

I've spent 20 years scaling data and machine learning systems, from distributed databases and search engines to Kubernetes-orchestrated ML clusters, from cloud infrastructure to production AI pipelines serving millions. The last 11 years were focused specifically on distributed enterprise NLP and LLMs: integrating TensorFlow and ONNX Runtime into JVM (Scala) for GPU/CPU parity in scalable enterprise production-ready environments, collaborating with Intel on OpenVINO optimizations and NVIDIA on CUDA acceleration, making HuggingFace model imports a one-liner that communities loved. I've debugged production AI systems at Fortune 500 companies and watched research prototypes fail in the real world because they ignored operational reality.

I learned what works. More importantly, I learned what doesn't.

So in July 2025, six months ago, I started a lunch-break open-source project. No committees, no roadmap politics, no enterprise baggage. Just 20 years of hard-won lessons applied to a domain that desperately needs better tools: healthcare AI.

The problem is clear. Cutting-edge healthcare AI is locked behind expensive paywalls and opaque "black-box" systems. Medical institutions pay for enterprise licenses. Researchers struggle with API rate limits. Startups can't afford commercial NLP/LLM subscriptions. And everyone accepts this because for most clinical tasks, open-source alternatives simply don't exist. The rare academic models that do exist? They break in production.

I refused to accept that trade-off.

My manifesto was simple: open-source is the cornerstone of accelerating progress in AI, particularly in healthcare. These aren't free alternatives that compromise on quality. They're models engineered to go toe-to-toe with private and paid alternatives, delivering cutting-edge performance that empowers researchers, clinicians, and developers worldwide.

OpenMed launched on July 16, 2025 with 380+ state-of-the-art medical models, all freely available under the permissive Apache 2.0 license. Use them. Modify them. Build commercial applications. No restrictions. Freedom that proprietary options can't match.

This isn't just a release. It's a movement.

Today, I'm excited to share what I've built, and what the community has accomplished with these tools.


By the Numbers

CleanShot 2026-01-06 at 10.33.17@2x

29.7 million downloads. That's how many times OpenMed models have been pulled from HuggingFace. Add 551,800 downloads of the Python toolkit from PyPI across 13 releases. It's humbling to see this level of adoption in such a short time.

  • 481 models across two HuggingFace organizations
  • 551,800 PyPI downloads of the openmed package (13 releases since September)
  • 2,396 followers across X, LinkedIn, and HuggingFace
  • 97 GitHub stars on the toolkit repository
  • 45 models available on AWS Marketplace
  • 257 commits across 4 major version releases

But numbers only tell part of the story. What matters more is what people are building with these tools.

CleanShot 2026-01-05 at 23.51.28@2x

What I Shipped

July: The Foundation

I launched with 380+ medical NER models covering the full spectrum of clinical text analysis:

  • Disease and condition detection
  • Pharmaceutical and chemical entity recognition
  • Oncology and genomics analysis
  • Anatomy and species identification
  • Pathology and protein detection

CleanShot 2026-01-05 at 23.58.31@2x

Each model was domain-adapted on 12+ public biomedical datasets, trained specifically for clinical and research use cases. The top model, PharmaDetect-SuperClinical-434M, has been downloaded 147,000+ times.

CleanShot 2026-01-06 at 00.00.05@2x

The Architecture Spectrum

CleanShot 2026-01-06 at 00.00.50@2x

From 33M to 770M parameters, I built models for every deployment scenario:

  • TinyMed variants (33M-135M): Fast inference on CPU, perfect for real-time applications
  • SuperClinical/SuperMedical (125M-434M): Production workhorses balancing speed and accuracy
  • BigMed, MultiMed, XLarge (560M-770M): Maximum accuracy for research pipelines

This isn't just model variety for the sake of it. It's about meeting teams where they are, whether you're running on a laptop or orchestrating cloud inference at scale.

Zero-Shot Capabilities

I integrated GLiNER for zero-shot NER, enabling custom entity extraction without retraining. Define your labels, run inference, and iterate. This opened OpenMed to use cases beyond the 12 benchmark datasets I initially targeted.

The Python Toolkit

Models are powerful, but developer experience matters. I built a complete Python library with:

One-line inference:

from openmed import analyze_text

result = analyze_text("Patient presents with hypertension and diabetes.",
                      model="disease_detection_superclinical")

Production features:

  • Batch processing with progress tracking
  • Configuration profiles (dev/prod/test/fast)
  • Multiple output formats (JSON, CSV, HTML)
  • Medical-aware tokenization (a novel post-processing approach that preserves model integrity while producing cleaner clinical entities)
  • Sentence detection and automatic chunking for long documents

CLI automation:

openmed analyze --model pharma_detection_superclinical clinical_notes.txt
openmed batch --pattern "data/**/*.txt" --output results.json

The Interactive TUI: Toward an AI-Native Medical Assistant

In December, I shipped v0.4.0 with a terminal user interface built on Textual. Think of it as the first step toward something bigger: a conversational AI-assisted medical analysis tool, similar to how Claude Code and GitHub Codex transformed software development, but for healthcare.

openmed-tui-preview

Today's TUI:

  • Multi-line text input with paste support
  • Color-coded entity highlighting (diseases in red, drugs in blue, anatomy in green)
  • Live confidence visualization with progress bars
  • Hot-swap between models (F2), adjust thresholds (F3)
  • Analysis history and multi-format export
  • Runs on remote servers via SSH (analyze sensitive data without moving it locally)

Where it's heading: An intelligent medical AI agent that understands clinical context, suggests relevant models, explains findings in natural language, and assists with everything from de-identification to coding to literature review, all from the terminal.

This wasn't just about making things prettier. It's about building the foundation for AI-augmented clinical workflows that feel natural.

Enterprise Distribution: AWS Marketplace

Making research accessible is one thing. Making it enterprise-ready is another.

45 OpenMed models are now available on AWS Marketplace, enabling:

  • One-click deployment to AWS SageMaker
  • Compliance-friendly licensing
  • Enterprise billing and support
  • Integration with existing AWS infrastructure

This partnership brings OpenMed into production environments at healthcare organizations and research institutions worldwide.


The Research

I published the methodology and benchmarks to arXiv:

"OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets"

The paper demonstrates that domain adaptation with careful dataset curation can match or exceed proprietary medical NLP solutions. Open source doesn't mean compromising on quality.


What People Are Building

CleanShot 2026-01-06 at 00.01.47@2x

The most rewarding part hasn't been the download numbers. It's been the conversations.

Over the past six months, I've had 20+ meetings with researchers, clinicians, and healthcare practitioners. We've exchanged ideas, discussed real-world challenges, and refined what OpenMed needs to become. These weren't sales calls or demos. They were collaborative sessions where I listened to people who actually work with medical data every day, understanding their pain points, their compliance nightmares, their frustrations with existing tools.

That feedback shaped the roadmap. It's why de-identification and assertion status are Q1 priorities. It's why the TUI supports SSH workflows. It's why the toolkit has batch processing and configurable thresholds.

And I've seen these same people build impressive things:

  • Clinical researchers extracting entities from EHR notes at scale
  • Pharmaceutical teams analyzing drug mentions in research literature
  • Bioinformatics pipelines processing genomic annotations
  • Healthcare startups building compliance-aware de-identification tools
  • Students learning medical NLP without expensive API costs

Every download represents someone solving a real problem. That's what makes this work meaningful.


Community Growth

From zero to 2,396 followers in 6 months:

More importantly: 97 GitHub stars, active discussions, and feature requests that are shaping the roadmap.

This is still early. I'm just getting started.


What's Next: The 2026 Roadmap

The goal remains the same: beat enterprise alternatives with open-source models that anyone can use, audit, and trust.

Q1 2026: Privacy, Compliance, and Clinical Reasoning

I'm shipping models that solve real regulatory and clinical challenges:

PII Detection & De-identification (HIPAA, GDPR)

Healthcare data is sensitive. Compliance isn't optional. I'm releasing:

  • PHI detection models covering all 18 HIPAA Safe Harbor identifier types
  • GDPR-compliant de-identification for EU healthcare
  • Redaction and pseudonymization pipelines
  • Re-identification tracking for authorized research use
  • Built to outperform commercial de-ID solutions while being fully auditable

Assertion Status Detection

Entity extraction is step one. Knowing whether a condition is present, absent, hypothetical, or historical is what turns NER into clinical decision support:

  • Fine-tuned models for assertion classification
  • Integration with existing NER pipelines
  • Negation and uncertainty detection
  • Temporal qualifiers (past, present, anticipated)

Biology & Life Sciences Models

Expanding beyond clinical text into biological research:

  • Protein and gene entity recognition
  • Pathway and molecular interaction detection
  • Laboratory result extraction
  • Genomic variant annotation
  • Built for bioinformatics pipelines and drug discovery

Medical LLMs: From 6B to 100B+ Parameters

Encoder models excel at extracting entities from text. Decoder-based LLMs understand context, reason about symptoms, and generate clinical insights. I'm releasing a spectrum of medical language models spanning 0.5B to 120B+ parameters, fine-tuned on clinical literature, case studies, and medical reasoning datasets. These models will excel at clinical summarization (condensing lengthy EHR notes into actionable insights), differential diagnosis (suggesting potential conditions based on symptoms), patient triage (prioritizing cases by urgency), medical question answering, treatment recommendation synthesis, and clinical documentation automation. The goal isn't to replace clinicians but to augment their workflow with AI that understands medical nuance, catches edge cases humans might miss, and accelerates routine cognitive tasks. These aren't generic LLMs with a medical system prompt. They're purpose-built for healthcare, trained on domain-specific data, and benchmarked against proprietary clinical AI systems. Open, auditable, and capable of running on-premise for HIPAA compliance.

All of these models share a common thread: they're designed to beat proprietary alternatives while remaining open, efficient, and production-ready.

Beyond Q1: The Long Game

The roadmap extends into concept linking (UMLS, ICD-10, CPT coding), clinical relation extraction, temporal reasoning, and social determinants of health (SDOH). I'm building a comprehensive healthcare AI stack, one open-source model at a time.

v1.0.0 will arrive when OpenMed is powering production systems at 10+ healthcare organizations, with full FHIR integration, benchmarking suites, and ensemble inference.

But the mission won't change: make healthcare AI smarter, together.


Lessons from Six Months

image

Building healthcare AI taught me things that no amount of enterprise software experience could have prepared me for. Here's what actually matters:

Listen first, build second

I spent as much time in meetings with researchers and clinicians as I did writing code. Understanding their real-world constraints (compliance nightmares, budget limitations, deployment restrictions) shaped what I built. Those conversations influenced everything from SSH-based workflows to configurable thresholds, while the global roadmap kept the vision intact.

Why this matters: Developer relations isn't a role; it's how you build products people actually need. Every feature in OpenMed came from listening to someone who works with medical data daily.

What I can do: Keep prioritizing those conversations. The 20+ meetings I've had with healthcare practitioners aren't networking: they're product development. When a hospital CTO tells you they can't deploy GitHub models, you learn what AWS Marketplace integration actually means.

Distribution is everything. Meet users where they are

Enterprise procurement doesn't work with "just clone from GitHub." Healthcare organizations, research institutions, and Fortune 500 companies operate within rigid constraints: approved vendor lists, compliance requirements, procurement workflows. I learned every major marketplace (AWS, Azure, GCP, Oracle, Databricks, Snowflake, CapGemini) because that's where enterprise users actually are.

Why this matters: HuggingFace is perfect for researchers. PyPI serves developers. But AWS Marketplace? That unlocks hospital IT departments, pharmaceutical companies, and health systems with real budgets. An open-source model that can't be procured is invisible to half your potential users.

What I can do: Build for where they work, not where you wish they worked. I shipped 380 models on day one instead of 5-10 because comprehensive coverage across medical domains made OpenMed immediately useful for diverse use cases. Distribution beats perfection. Go wide, go fast, then iterate based on real usage.

Open source builds trust, but healthcare is inexcusably behind

Look at every other domain: coding assistants, agentic workflows, computer vision, multimodal AI, robotics. There's a race. Meta releases Llama. Google counters with Gemma. Mistral AI, Alibaba with Qwen, DeepSeek, Hugging Face: everyone is competing to offer the best open-source AI. It's accelerating innovation at a pace we've never seen.

Yet healthcare and biology? Crickets. The fields where lives are literally at stake, where auditability matters most, where bias can kill: we're stuck with proprietary black boxes and paywalled APIs. The disconnect is staggering.

Why this matters: Open source isn't just a distribution model; it's how you earn trust in domains where mistakes have consequences. Publishing the OpenMed paper, open-sourcing every model, documenting everything: that created credibility no marketing budget could buy. Transparency isn't a nice-to-have. In healthcare, it's non-negotiable.

What I can do: Keep pushing. Every open-source medical model released is a small act of defiance against a system that normalizes opacity in life-or-death decisions. The race happening in other AI domains needs to happen in healthcare. Someone has to start it.

Community takes time, but quality compounds

I'm at 2,396 followers, not 20,000. That's fine. What matters is that I'm having meetings with all the right people: researchers at top institutions, clinicians managing real patient data, healthcare AI teams at Fortune 500 companies, and practitioners who actually understand the compliance nightmare.

Why this matters: The people here are engaged, building real systems, and pushing the project forward. A thousand Twitter followers who retweet don't move the needle. Ten hospital AI teams integrating OpenMed into production pipelines? That changes healthcare.

What I can do: Stay focused on depth over breadth. Quality over vanity metrics. The right 2,396 followers matter more than the wrong 20,000.


Thank You

To everyone who downloaded a model, filed an issue, or shared OpenMed with a colleague: thank you.

This project exists because of you.

If you're building something with OpenMed, I'd love to hear about it. Reach out on X, LinkedIn, or GitHub.

And if you haven't tried OpenMed yet, give it a shot:

uv pip install openmed
openmed  # Launch the interactive interface

Or explore the 481 models on HuggingFace and 45 on AWS Marketplace.

Here's to the next six months.


Links:


Maziyar Panahi January 2026

Community

Sign up or log in to comment