In 2023, Insilico Medicine announced ISM001-055, a drug candidate for idiopathic pulmonary fibrosis that had been designed entirely by AI — from target identification to molecular generation to lead selection. It reached Phase 2 clinical trials in under three years from conception, a pace that typically takes a decade in traditional drug development. That milestone has become the emblem of a broader transformation: artificial intelligence is no longer a tool for processing medical data. It is becoming a co-investigator, a diagnostician, and, increasingly, a designer of the molecules that will define medicine in the coming decade.
The scale of the opportunity is difficult to overstate. Drug discovery currently takes 10–15 years and costs $2.6 billion on average to bring a single drug to market. Diagnostic errors affect an estimated 12 million Americans annually. Rare diseases — over 7,000 identified — mostly lack effective treatments because the patient populations are too small to justify traditional research investment. AI cannot solve all of these problems, but it is beginning to chip away at each of them in ways that were not possible even five years ago.
Protein Structure Prediction: The AlphaFold Revolution
Understanding a drug’s mechanism of action requires understanding protein structure — how a molecule folds in three-dimensional space determines what other molecules it can bind to and what functions it performs. For 50 years, experimentally determining a protein structure required X-ray crystallography or cryo-electron microscopy: expensive, slow, and not always possible.
In 2020, DeepMind’s AlphaFold 2 achieved near-experimental accuracy in protein structure prediction for the first time, solving a problem that had been considered one of the grand challenges of biology for half a century. In 2024, AlphaFold 3 extended this capability to predict the structure of protein-DNA, protein-RNA, and protein-ligand interactions — the kinds of interactions that determine whether a small molecule drug will bind to its intended target. The AlphaFold Protein Structure Database, built in partnership with EMBL’s European Bioinformatics Institute, now contains predicted structures for over 200 million proteins — virtually the entire known proteome of life on Earth, publicly accessible.
Researchers at Recursion Pharmaceuticals have used AlphaFold structures to screen millions of compounds against hundreds of disease targets computationally, identifying candidates in weeks that would have taken years of experimental screening. Relay Therapeutics uses protein motion simulation (not just static structures) to design inhibitors for traditionally “undruggable” targets. BioAge Labs is using multi-omic aging biomarkers to identify targets for longevity therapeutics — a category that essentially did not exist as a serious pharmaceutical pursuit before AI made target identification tractable.
AI Diagnostics: What the Numbers Actually Show
AI diagnostic tools have generated considerable hype. They have also generated considerable evidence. The numbers deserve close examination.
Lung nodule detection: A 2023 study published in Nature Medicine compared AI performance to radiologist performance in detecting lung cancer nodules from CT scans. The AI system achieved 94.4% sensitivity and 91.7% specificity, compared to 89.6% sensitivity and 90.3% specificity for board-certified radiologists. The AI also performed significantly better on smaller nodules (under 6mm) where early detection matters most.
Diabetic retinopathy screening: IDx-DR, the first AI diagnostic authorized by the FDA to provide a screening decision without physician involvement, detects diabetic retinopathy with 87.2% sensitivity and 90.7% specificity. It has been deployed in primary care settings, allowing endocrinologists and primary care physicians to screen patients who would otherwise never reach an ophthalmologist. In underserved communities with limited specialist access, this is a meaningful capability.
Stroke triage: RapidAI‘s stroke AI has been deployed in over 2,000 hospitals globally. It analyzes CT perfusion scans to identify salvageable brain tissue, generating an actionable report in under 2 minutes versus 30–60 minutes for manual review. Door-to-treatment time — the critical metric in stroke outcomes — has decreased by an average of 54 minutes at sites using the system.
These are not marginal improvements. In medicine, minutes and percentage points are measured in lives.
Genomics and Personalized Medicine
Polygenic Risk Scores (PRS) aggregate the effects of thousands of common genetic variants to predict disease risk. Traditional single-gene testing catches rare, high-penetrance variants like BRCA1/2 in breast cancer. PRS identifies individuals in the top decile of common variant risk — a group whose lifetime risk of conditions like coronary artery disease, type 2 diabetes, or schizophrenia rivals the risk conferred by single high-impact rare variants. Large-scale biobanks (UK Biobank, All of Us, Million Veteran Program) combined with machine learning are rapidly improving PRS accuracy, especially for non-European populations where earlier studies were underpowered.
Pharmacogenomics — using genetic variation to predict drug response — is moving from research curiosity to clinical standard. The FDA now includes pharmacogenomic information in prescribing labels for over 200 drugs. Variants in CYP2D6, CYP2C19, and other metabolizing enzymes affect how patients process antidepressants, anticoagulants, and opioids. AI systems that integrate pharmacogenomic profiles with medication records can flag dangerous drug-gene interactions before a prescription is filled.
Clinical AI: Hospital Systems and the EHR Challenge
Electronic Health Records contain decades of structured and unstructured clinical data — diagnosis codes, lab values, medication lists, physician notes — that represent an unprecedented opportunity for clinical AI. The challenge is that this data is messy, inconsistently structured, and siloed across incompatible systems.
Epic Systems, the dominant EHR platform, has integrated AI predictions directly into clinical workflows. Their Deterioration Index — a real-time risk score for patient deterioration — is now active across hundreds of hospitals, generating alerts when a hospitalized patient’s trajectory suggests imminent decompensation. Their Sepsis prediction model flags patients at risk of sepsis hours before clinical criteria are met, enabling earlier antibiotic administration and IV fluid resuscitation.
For researchers and developers building clinical AI, the MIMIC-III and MIMIC-IV datasets (from Beth Israel Deaconess Medical Center and PhysioNet) provide de-identified EHR data for over 60,000 ICU admissions — the most widely used open dataset for clinical ML research. Key considerations when working with clinical data:
- Temporal leakage: Features available at prediction time must be carefully distinguished from features only available retrospectively.
- Distribution shift: A model trained on one hospital’s patient population may perform poorly at another due to differences in patient demographics, treatment protocols, and documentation practices.
- Label noise: Diagnosis codes reflect billing decisions as much as clinical reality. “Heart failure” in ICD-10 covers a heterogeneous population with different pathophysiology.
- Class imbalance: Adverse events (sepsis, readmission, mortality) are rare. Standard accuracy metrics are misleading — always report AUROC, precision-recall curves, and calibration.
Regulatory Framework: The FDA’s Evolving Approach
The U.S. Food and Drug Administration regulates AI diagnostic tools as Software as a Medical Device (SaMD). The classification depends on the level of physician oversight required:
- Class I (General Controls): Low-risk tools, exempt from premarket review. Example: workflow management software that does not make clinical decisions.
- Class II (510(k) clearance): Moderate risk, cleared via substantial equivalence to a predicate device. Most AI diagnostic tools targeting specific conditions fall here.
- Class III (PMA): High-risk devices, require full clinical evidence of safety and effectiveness. IDx-DR used the De Novo pathway, which established a new regulatory category for autonomous AI diagnostic devices.
The FDA’s Predetermined Change Control Plan (PCCP), introduced in 2023, allows manufacturers to pre-specify how an AI/ML-based device can be updated after clearance without requiring a new submission for each update. This framework is critical for AI systems that improve through continuous learning on real-world data. Any developer planning to commercialize an AI diagnostic tool should review the FDA’s Action Plan for AI/ML-Based Software as a Medical Device and the accompanying draft guidance documents.
Bias in Healthcare AI: A Critical Safety Issue
The Obermeyer et al. study published in Science in 2019 remains the most cited example of algorithmic bias in healthcare. A widely deployed health management algorithm used healthcare cost as a proxy for health need — and because structural racism had resulted in Black patients receiving less care historically, the algorithm systematically underestimated their health needs. By one estimate, the algorithm had to be fixed to increase the percentage of Black patients correctly identified as high-risk from 17.7% to 46.5%.
Pulse oximeters — devices that measure blood oxygen saturation via light transmission through the skin — performed significantly worse on patients with darker skin pigmentation. During COVID-19, multiple studies showed that pulse oximeters overestimated oxygen saturation in Black patients, masking hypoxemia that led to delayed treatment and worse outcomes. The devices had been developed and validated predominantly on lighter-skinned populations.
For anyone building healthcare AI, bias auditing is not optional — it is a patient safety requirement. The recommended practice is to evaluate model performance stratified by race, sex, age, and socioeconomic status before any deployment, and to implement RAGAS (Retrieval-Augmented Generation Assessment) frameworks for clinical text AI systems. If disaggregated performance is not available, the model should not be deployed.
Career Paths in Healthcare AI
Clinical NLP Engineer — Builds models that extract structured information from unstructured physician notes, radiology reports, and pathology reports. One of the highest-demand roles in medical AI. Salary range: $140,000–$200,000. Required: strong NLP skills, understanding of clinical terminology (SNOMED-CT, ICD-10, LOINC).
Computational Drug Discovery Scientist — Applies ML to target identification, molecular generation, and ADMET (absorption, distribution, metabolism, excretion, toxicity) prediction. Salary range: $120,000–$180,000. Required: biochemistry or bioinformatics background plus ML proficiency.
Health AI Product Manager — Navigates the regulatory, clinical, and technical requirements for AI medical devices. One of the most strategically important roles as FDA oversight expands. Salary range: $150,000–$220,000.
Clinical Informaticist — Designs and manages clinical data infrastructure (EHR integrations, HL7 FHIR APIs, data warehouses) that enables AI research and deployment. Salary range: $110,000–$160,000.
Healthcare AI is not moving fast and breaking things. It is moving carefully and saving lives. The engineers and researchers who can navigate both the technical and regulatory complexity of this domain are among the most sought-after specialists in the industry — and the work they do matters in ways that most software careers simply do not.