LLM-MINE: Large Language Model based Alzheimer's Disease and Related Dementias Phenotypes Mining from Clinical Notes

πŸ“… 2026-03-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Accurately extracting Alzheimer’s disease and related dementias (ADRD) phenotypes from unstructured electronic health records is crucial for early detection and staging, yet remains highly challenging. This work proposes LLM-MINE, a novel framework that, for the first time, integrates large language models with few-shot prompting guided by an expert-defined phenotype list to automatically mine highly discriminative ADRD phenotypes. The method substantially outperforms conventional named entity recognition and dictionary-matching approaches. Validated across multiple cohorts, it demonstrates statistically significant phenotype associations, with memory impairment emerging as the strongest discriminative feature. Furthermore, the framework achieves state-of-the-art unsupervised clustering performance (ARI = 0.290, NMI = 0.232), effectively supporting ADRD staging.

Technology Category

Application Category

πŸ“ Abstract
Accurate extraction of Alzheimer's Disease and Related Dementias (ADRD) phenotypes from electronic health records (EHR) is critical for early-stage detection and disease staging. However, this information is usually embedded in unstructured textual data rather than tabular data, making it difficult to be extracted accurately. We therefore propose LLM-MINE, a Large Language Model-based phenotype mining framework for automatic extraction of ADRD phenotypes from clinical notes. Using two expert-defined phenotype lists, we evaluate the extracted phenotypes by examining their statistical significance across cohorts and their utility for unsupervised disease staging. Chi-square analyses confirm statistically significant phenotype differences across cohorts, with memory impairment being the strongest discriminator. Few-shot prompting with the combined phenotype lists achieves the best clustering performance (ARI=0.290, NMI=0.232), substantially outperforming biomedical NER and dictionary-based baselines. Our results demonstrate that LLM-based phenotype extraction is a promising tool for discovering clinically meaningful ADRD signals from unstructured notes.
Problem

Research questions and friction points this paper is trying to address.

Alzheimer's Disease
Related Dementias
phenotype extraction
clinical notes
electronic health records
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model
Phenotype Mining
Few-shot Prompting
Alzheimer's Disease
Clinical Notes
πŸ”Ž Similar Papers
No similar papers found.