Cohort-Aware Agents for Individualized Lung Cancer Risk Prediction Using a Retrieval-Augmented Model Selection Framework

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Poor generalizability of lung cancer risk prediction models across diverse populations hinders their clinical utility in heterogeneous patient cohorts and real-world settings. To address this, we propose a retrieval-augmented intelligent agent framework featuring the first cohort-aware dynamic model selection mechanism: leveraging FAISS to retrieve similar patient subcohorts from multi-center datasets, then employing a large language model to reason over retrieved cohort characteristics and performance metrics of eight candidate models—including both classical and state-of-the-art methods—to automatically recommend the optimal predictive model. The framework enables joint modeling of imaging and structured clinical features, achieving personalized risk assessment. It significantly improves accuracy in identifying high-risk individuals and enhances model adaptability across heterogeneous clinical environments. Our approach establishes a novel, interpretable, and deployable paradigm for precision lung cancer screening.

Technology Category

Application Category

📝 Abstract

Accurate lung cancer risk prediction remains challenging due to substantial variability across patient populations and clinical settings -- no single model performs best for all cohorts. To address this, we propose a personalized lung cancer risk prediction agent that dynamically selects the most appropriate model for each patient by combining cohort-specific knowledge with modern retrieval and reasoning techniques. Given a patient's CT scan and structured metadata -- including demographic, clinical, and nodule-level features -- the agent first performs cohort retrieval using FAISS-based similarity search across nine diverse real-world cohorts to identify the most relevant patient population from a multi-institutional database. Second, a Large Language Model (LLM) is prompted with the retrieved cohort and its associated performance metrics to recommend the optimal prediction algorithm from a pool of eight representative models, including classical linear risk models (e.g., Mayo, Brock), temporally-aware models (e.g., TDVIT, DLSTM), and multi-modal computer vision-based approaches (e.g., Liao, Sybil, DLS, DLI). This two-stage agent pipeline -- retrieval via FAISS and reasoning via LLM -- enables dynamic, cohort-aware risk prediction personalized to each patient's profile. Building on this architecture, the agent supports flexible and cohort-driven model selection across diverse clinical populations, offering a practical path toward individualized risk assessment in real-world lung cancer screening.

Problem

Research questions and friction points this paper is trying to address.

Dynamically selects optimal model for personalized lung cancer risk prediction

Addresses variability across patient populations and clinical settings

Combines cohort retrieval with LLM reasoning for individualized assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

FAISS-based cohort retrieval for patient similarity

LLM reasoning for optimal model selection

Dynamic personalized risk prediction using multi-modal data

🔎 Similar Papers

Health-LLM: Personalized Retrieval-Augmented Disease Prediction System