A medical coding language model trained on clinical narratives from a population-wide cohort of 1.8 million patients

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the inefficiency and systematic undercoding of secondary diagnoses in manual medical coding by developing a multimodal language model trained on 5.8 million electronic health records from 1.8 million patients in eastern Denmark—a population-scale cohort encompassing nearly all medical specialties. The model integrates clinical notes, medication records, and laboratory data to predict ICD-10 codes. Evaluated on a hold-out set of 270,000 patients, it achieves a micro-averaged F1 score of 71.8% and a top-10 recall of 95.5%. It also identified thousands of cases with missed secondary diagnoses, 76–86% of which were confirmed as valid upon manual review. The approach can automate approximately 50% of coding tasks, offering a scalable tool for epidemiological and multimorbidity research.

Technology Category

Application Category

📝 Abstract
Medical coding translates clinical documentation into standardized codes for billing, research, and public health, but manual coding is time-consuming and error-prone. Existing automation efforts rely on small datasets that poorly represent real-world patient heterogeneity. We trained a language model on 5.8 million electronic health records from 1.8 million patients across nearly all specialties in Eastern Denmark (2006--2016) to predict ICD-10 codes from clinical notes, medications, and laboratory results. Evaluated on 270,000 held-out patients, the model achieved a micro F1 of 71.8% and a top-10 recall of 95.5%. Performance varied by specialty (F1: 53--91%), with higher scores in specialties with well-defined diagnostic criteria. Codes appearing predominantly as secondary diagnoses had markedly lower F1 scores. For three such codes (suicide-related behaviors, weight disorders, and hypertension), the model identified thousands of uncoded cases, of which 76-86% were confirmed valid upon manual review, suggesting systematic under-coding rather than model error. These findings suggest under-coding of secondary diagnoses in Eastern Denmark during this period, with potential implications for epidemiological research, public health surveillance, and understanding of multimorbidity. Similar time constraints and reimbursement structures in other healthcare systems suggest this may not be isolated to this dataset. The model can automate coding for approximately 50% of cases and provide accurate suggestions for most others, and may offer a practical solution to help capture missed secondary conditions.
Problem

Research questions and friction points this paper is trying to address.

medical coding
under-coding
secondary diagnoses
ICD-10
clinical narratives
Innovation

Methods, ideas, or system contributions that make the work stand out.

medical coding
large-scale clinical language model
ICD-10 prediction
under-coding detection
electronic health records
🔎 Similar Papers
No similar papers found.
J
Joakim Edin
Section for Health Data Science and AI, Department of Public Health, University of Copenhagen, Denmark
S
Sedrah Butt Balaganeshan
Section for Health Data Science and AI, Department of Public Health, University of Copenhagen, Denmark
A
Annike Kjølby Kristensen
Section for Health Data Science and AI, Department of Public Health, University of Copenhagen, Denmark
Lars Maaløe
Lars Maaløe
Co-Founder & CTO @ Corti | Adj. Assoc. Professor of Machine Learning @ DTU
Machine Learning
I
Ioannis Louloudis
Section for Health Data Science and AI, Department of Public Health, University of Copenhagen, Denmark
Søren Brunak
Søren Brunak
Professor of Disease Systems Biology, University of Copenhagen
BioinformaticsSystems BiologySystems MedicineDigital HealthData Sciences