🤖 AI Summary
This study addresses the challenge of automatically mapping patient phenotypic manifestations in clinical text to the standardized Human Phenotype Ontology (HPO). Methodologically, it presents the first systematic evaluation of GPT-4’s end-to-end performance—spanning symptom identification, classification, and HPO standardization—on OMIM clinical summaries, and introduces an automated pipeline integrating retrieval-augmented generation (RAG), API-coordinated LLM orchestration, and HPO ontology alignment. Results show that symptom identification and classification accuracy matches inter-annotator agreement levels; however, HPO ID recall remains suboptimal and warrants further refinement. The pipeline significantly enhances analytical throughput and scalability. The core contribution is the empirical validation of LLM-driven phenotypic standardization feasibility, coupled with a reusable, API-coordinated LLM engineering framework. This work establishes a scalable computational phenotyping infrastructure for precision medicine.
📝 Abstract
High-throughput phenotyping automates the mapping of patient signs to standardized concepts, such as those in Human Phenotype Ontology (HPO), a process critical to precision medicine. We evaluated the automated phenotyping of clinical summaries from the Online Mendelian Inheritance in Man (OMIM) database using a large language model. Various APIs were used to automate text retrieval, sign identification, categorization, and normalization. GPT-4 outperformed GPT-3.5Turbo in identifying, categorizing, and normalizing signs, achieving concordance with manual annotators comparable to concordance between manual annotators. While GPT-4 demonstrates high accuracy in sign identification and categorization, limitations remain in sign normalization, particularly in retrieving the correct HPO ID for a normalized term. Methods such as retrieval-augmented generation, changes in pre-training, and additional fine-tuning may help address these limitations. The combination of APIs with large language models presents a promising approach for high-throughput phenotyping of free text.