🤖 AI Summary
The clinical adoption of black-box AI in stereotactic radiosurgery (SRS) is hindered by its opacity and lack of interpretability. Method: We propose SAGE—the first large language model (LLM)-based agent system integrating chain-of-thought (CoT) reasoning for fully automated single-fraction 18-Gy SRS planning for brain metastases. SAGE employs dose-constraint modeling, prospective constraint validation, multi-objective trade-off analysis, and a human-in-the-loop feedback loop to enable auditable, traceable, human-like reasoning and generate comprehensive optimization logs. Results: Experimental evaluation shows no statistically significant difference between SAGE and manual plans for key dosimetric metrics—including PTV coverage and maximum dose (p > 0.21)—while significantly reducing cochlear dose (p = 0.022). The system executed 457 constraint validations and 609 trade-off analyses, directly addressing the core clinical trust barrier for AI in radiotherapy planning.
📝 Abstract
Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity concerns. We tested whether chain-of-thought reasoning improves agentic planning in a retrospective cohort of 41 patients with brain metastases treated with 18 Gy single-fraction SRS. We developed SAGE (Secure Agent for Generative Dose Expertise), an LLM-based planning agent for automated SRS treatment planning. Two variants generated plans for each case: one using a non-reasoning model, one using a reasoning model. The reasoning variant showed comparable plan dosimetry relative to human planners on primary endpoints (PTV coverage, maximum dose, conformity index, gradient index; all p > 0.21) while reducing cochlear dose below human baselines (p = 0.022). When prompted to improve conformity, the reasoning model demonstrated systematic planning behaviors including prospective constraint verification (457 instances) and trade-off deliberation (609 instances), while the standard model exhibited none of these deliberative processes (0 and 7 instances, respectively). Content analysis revealed that constraint verification and causal explanation concentrated in the reasoning agent. The optimization traces serve as auditable logs, offering a path toward transparent automated planning.