🤖 AI Summary
This study addresses the challenge of modeling complex patient trajectories in longitudinal electronic health records for early multi-cancer risk prediction by proposing TrajOnco, a training-free multi-agent large language model framework. Integrating a chained agent architecture with a long-term memory mechanism, TrajOnco enables zero-shot temporal reasoning to generate patient-level summaries, interpretable evidence chains, and cancer risk scores. Evaluated across 15 cancer types, the framework achieves AUROC scores ranging from 0.64 to 0.80, with lung cancer prediction performance comparable to supervised models. Human evaluations confirm the clinical credibility of its outputs, demonstrating significant improvements over single-agent approaches and revealing risk patterns consistent with established medical knowledge.
📝 Abstract
Accurate estimation of cancer risk from longitudinal electronic health records (EHRs) could support earlier detection and improved care, but modeling such complex patient trajectories remains challenging. We present TrajOnco, a training-free, multi-agent large language model (LLM) framework designed for scalable multi-cancer early detection. Using a chain-of-agents architecture with long-term memory, TrajOnco performs temporal reasoning over sequential clinical events to generate patient-level summaries, evidence-linked rationales, and predicted risk scores. We evaluated TrajOnco on de-identified Truveta EHR data across 15 cancer types using matched case-control cohorts, predicting risk of cancer diagnosis at 1 year. In zero-shot evaluation, TrajOnco achieved AUROCs of 0.64-0.80, performing comparably to supervised machine learning in a lung cancer benchmark while demonstrating better temporal reasoning than single-agent LLMs. The multi-agent design also enabled effective temporal reasoning with smaller-capacity models such as GPT-4.1-mini. The fidelity of TrajOnco's output was validated through human evaluation. Furthermore, TrajOnco's interpretable reasoning outputs can be aggregated to reveal population-level risk patterns that align with established clinical knowledge. These findings highlight the potential of multi-agent LLMs to execute interpretable temporal reasoning over longitudinal EHRs, advancing both scalable multi-cancer early detection and clinical insight generation.