🤖 AI Summary
Poor generalizability of lung cancer risk prediction models across diverse populations hinders their clinical utility in heterogeneous patient cohorts and real-world settings. To address this, we propose a retrieval-augmented intelligent agent framework featuring the first cohort-aware dynamic model selection mechanism: leveraging FAISS to retrieve similar patient subcohorts from multi-center datasets, then employing a large language model to reason over retrieved cohort characteristics and performance metrics of eight candidate models—including both classical and state-of-the-art methods—to automatically recommend the optimal predictive model. The framework enables joint modeling of imaging and structured clinical features, achieving personalized risk assessment. It significantly improves accuracy in identifying high-risk individuals and enhances model adaptability across heterogeneous clinical environments. Our approach establishes a novel, interpretable, and deployable paradigm for precision lung cancer screening.
📝 Abstract
Accurate lung cancer risk prediction remains challenging due to substantial variability across patient populations and clinical settings -- no single model performs best for all cohorts. To address this, we propose a personalized lung cancer risk prediction agent that dynamically selects the most appropriate model for each patient by combining cohort-specific knowledge with modern retrieval and reasoning techniques. Given a patient's CT scan and structured metadata -- including demographic, clinical, and nodule-level features -- the agent first performs cohort retrieval using FAISS-based similarity search across nine diverse real-world cohorts to identify the most relevant patient population from a multi-institutional database. Second, a Large Language Model (LLM) is prompted with the retrieved cohort and its associated performance metrics to recommend the optimal prediction algorithm from a pool of eight representative models, including classical linear risk models (e.g., Mayo, Brock), temporally-aware models (e.g., TDVIT, DLSTM), and multi-modal computer vision-based approaches (e.g., Liao, Sybil, DLS, DLI). This two-stage agent pipeline -- retrieval via FAISS and reasoning via LLM -- enables dynamic, cohort-aware risk prediction personalized to each patient's profile. Building on this architecture, the agent supports flexible and cohort-driven model selection across diverse clinical populations, offering a practical path toward individualized risk assessment in real-world lung cancer screening.