π€ AI Summary
This work addresses the challenge that existing large language models struggle to capture the non-monotonic symptom progression across multiple clinical visits in long-term dementia prognosis, primarily due to sparse binary reward signals and the absence of explicit symptom trajectory annotations. To overcome these limitations, the authors propose Dementia-R1, a novel framework that integrates cold-start reinforcement learning with pretraining grounded in verifiable clinical metrics. The approach first extracts longitudinal clinical signals from unstructured electronic health records for pretraining and then refines predictions of final clinical outcomes through reinforcement learning. This strategy effectively mitigates the issues of sparse rewards and complex trajectory modeling, achieving an F1 score of 77.03% on real-world data. Notably, the 7B-parameter variant matches GPT-4oβs performance on the ADNI benchmark while significantly enhancing the modelβs ability to capture fluctuating cognitive trajectories.
π Abstract
While Large Language Models (LLMs) have shown strong performance on clinical text understanding, they struggle with longitudinal prediction tasks such as dementia prognosis, which require reasoning over complex, non-monotonic symptom trajectories across multiple visits. Standard supervised training lacks explicit annotations for symptom evolution, while direct Reinforcement Learning (RL) is hindered by sparse binary rewards. To address this challenge, we introduce Dementia-R1, an RL-based framework for longitudinal dementia prognosis from unstructured clinical notes. Our approach adopts a Cold-Start RL strategy that pre-trains the model to predict verifiable clinical indices extracted from patient histories, enhancing the capability to reason about disease progression before determining the final clinical status. Extensive experiments demonstrate that Dementia-R1 achieves an F1 score of 77.03% on real-world unstructured clinical datasets. Notably, on the ADNI benchmark, our 7B model rivals GPT-4o, effectively capturing fluctuating cognitive trajectories. Code is available at https://anonymous.4open.science/r/dementiar1-CDB5