An artificial intelligence framework for end-to-end rare disease phenotyping from clinical notes using large language models

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This study addresses the labor-intensive and time-consuming nature of manual extraction and standardization of phenotypic information in rare diseases, a challenge that existing AI approaches struggle to model end-to-end within real-world clinical workflows. To this end, the authors propose RARE-PHENIX, a novel framework that, for the first time, formulates clinical phenotyping as an end-to-end pipeline comprising large language model (LLM)-driven phenotype extraction, Human Phenotype Ontology (HPO)-based standardization, and supervised ranking of diagnosis-relevant phenotypes. Integrating LLMs, ontology alignment, and learning-to-rank techniques, RARE-PHENIX is trained and validated on multicenter real-world clinical data. External evaluation demonstrates that RARE-PHENIX significantly outperforms the current state-of-the-art baseline, PhenoBERT, achieving higher ontology similarity (0.70 vs. 0.58) and F1 scores, while also showing greater concordance with expert clinician annotations.

Technology Category

Application Category

📝 Abstract

Phenotyping is fundamental to rare disease diagnosis, but manual curation of structured phenotypes from clinical notes is labor-intensive and difficult to scale. Existing artificial intelligence approaches typically optimize individual components of phenotyping but do not operationalize the full clinical workflow of extracting features from clinical text, standardizing them to Human Phenotype Ontology (HPO) terms, and prioritizing diagnostically informative HPO terms. We developed RARE-PHENIX, an end-to-end AI framework for rare disease phenotyping that integrates large language model-based phenotype extraction, ontology-grounded standardization to HPO terms, and supervised ranking of diagnostically informative phenotypes. We trained RARE-PHENIX using data from 2,671 patients across 11 Undiagnosed Diseases Network clinical sites, and externally validated it on 16,357 real-world clinical notes from Vanderbilt University Medical Center. Using clinician-curated HPO terms as the gold standard, RARE-PHENIX consistently outperformed a state-of-the-art deep learning baseline (PhenoBERT) across ontology-based similarity and precision-recall-F1 metrics in end-to-end evaluation (i.e., ontology-based similarity of 0.70 vs. 0.58). Ablation analyses demonstrated performance improvements with the addition of each module in RARE-PHENIX (extraction, standardization, and prioritization), supporting the value of modeling the full clinical phenotyping workflow. By modeling phenotyping as a clinically aligned workflow rather than a single extraction task, RARE-PHENIX provides structured, ranked phenotypes that are more concordant with clinician curation and has the potential to support human-in-the-loop rare disease diagnosis in real-world settings.

Problem

Research questions and friction points this paper is trying to address.

rare disease

phenotyping

clinical notes

Human Phenotype Ontology

diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

end-to-end phenotyping

large language models

Human Phenotype Ontology