Can Language Models Identify Side Effects of Breast Cancer Radiation Treatments?

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This study addresses the challenges clinicians face in communicating radiotherapy side effects to breast cancer survivors, which are often hindered by limited knowledge and fragmented electronic health records. It presents the first systematic evaluation of seven instruction-tuned large language models (LLMs) for this task, employing a stress-test framework comprising 21 patient scenarios. The assessment integrates multiple prompting strategies and a clinical reference standard developed by seven or more radiation oncologists, encompassing dose-fractionation regimens, radiation fields, and associated toxicities. Results reveal that LLMs are highly sensitive to minor prompt variations and systematically underreport rare and long-term side effects. To mitigate these limitations, the study proposes an output-constraining mechanism grounded in the clinical reference standard, which significantly improves the precision, recall, and robustness of model-generated responses.

📝 Abstract

Accurately communicating the side effects of cancer treatments to cancer survivors is critical, particularly in settings such as informed consent, where clinicians must clearly and comprehensively convey potential treatment toxicities. However, this task remains challenging due to clinical knowledge deficits about adverse treatment effects and fragmentation across electronic health record (EHR) systems. Large language models (LLMs) have the potential to assist in this task, though their reliability in oncology survivorship contexts remains poorly understood. We present a deployment-oriented stress-testing framework for evaluating LLM-generated radiation side effect lists in breast cancer treatment and survivorship care. Using 21 breast cancer patient profiles, we construct paired patient clinical scenarios that differ only in radiotherapy regimens to evaluate seven instruction-tuned LLMs under multiple prompting regimes. We then compare LLM outputs to a clinician-curated reference derived from informed consent documents at two major academic medical centers and developed by a team including more than seven breast radiation oncologists. The reference maps radiation dose-fractionation, fields, and locations to associated toxicities, broken down by frequency and temporal onset. Across models, we reveal sensitivity to minor documentation changes, trade-offs between precision and recall, and systematic under-recall of rare and long-term side effects. When used alone, constraints on the number of side effects generated reduce precision, and grounding outputs in clinician-curated side effect lists substantially improves reliability and robustness. These findings highlight important limitations of LLM use in oncology and suggest practical design choices for safer and more informative survivorship-focused applications.

Problem

Research questions and friction points this paper is trying to address.

side effects

breast cancer

radiation therapy

clinical communication

oncology survivorship

Innovation

Methods, ideas, or system contributions that make the work stand out.

stress-testing framework

large language models

radiation side effects