🤖 AI Summary
This study introduces a novel nested entity recognition task for monitoring cognitive processes in Parkinson’s disease (PD) patients’ daily self-reported texts, enabling low-burden, longitudinal tracking of cognitive and affective changes. Methodologically, we systematically compare three paradigms: Bio_ClinicalBERT span classification, QLoRA-finetuned Llama-3-8B-Instruct, and zero-/few-shot instruction learning with GPT-4o mini. Results show that the fine-tuned Llama-3-8B-Instruct achieves the best performance (micro-F1 = 0.74, macro-F1 = 0.59), particularly excelling on context-dependent, semantically abstract categories such as “thinking” and “social interaction,” where it significantly outperforms baselines. To our knowledge, this is the first work to adapt large language models for fine-grained, overlapping cognitive category extraction from naturalistic patient narratives. It empirically validates the feasibility and potential of LLMs in augmenting clinical neuropsychological assessment.
📝 Abstract
Understanding how individuals with Parkinson's disease (PD) describe cognitive experiences in their daily lives can offer valuable insights into disease-related cognitive and emotional changes. However, extracting such information from unstructured patient narratives is challenging due to the subtle, overlapping nature of cognitive constructs. This study developed and evaluated natural language processing (NLP) models to automatically identify categories that reflect various cognitive processes from de-identified first-person narratives. Three model families, a Bio_ClinicalBERT-based span categorization model for nested entity recognition, a fine-tuned Meta-Llama-3-8B-Instruct model using QLoRA for instruction following, and GPT-4o mini evaluated under zero- and few-shot settings, were compared on their performance on extracting seven categories. Our findings indicated that model performance varied substantially across categories and model families. The fine-tuned Meta-Llama-3-8B-Instruct achieved the highest overall F1-scores (0.74 micro-average and 0.59 macro-average), particularly excelling in context-dependent categories such as thought and social interaction. Bio_ClinicalBERT exhibited high precision but low recall and performed comparable to Llama for some category types such as location and time but failed on other categories such as thought, emotion and social interaction. Compared to conventional information extraction tasks, this task presents a greater challenge due to the abstract and overlapping nature of narrative accounts of complex cognitive processes. Nonetheless, with continued refinement, these NLP systems hold promise for enabling low-burden, longitudinal monitoring of cognitive function and serving as a valuable complement to formal neuropsychological assessments in PD.