OncoReason: Structuring Clinical Reasoning in LLMs for Robust and Interpretable Survival Prediction

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) struggle to simultaneously achieve high accuracy and clinical interpretability in cancer prognosis prediction, primarily due to their lack of structured clinical reasoning capabilities. Method: We propose a multi-task alignment framework that jointly models survival classification, survival time regression, and natural language rationale generation. To bridge this gap, we introduce Group Relative Policy Optimization (GRPO), a novel reinforcement learning method grounded in expert-defined clinical reasoning paths—enabling synchronized optimization of both predictive performance and structured reasoning fidelity. Contribution/Results: Integrated with LLaMA3-8B and Med42-8B, our approach combines supervised fine-tuning and chain-of-thought prompting. It achieves a 6.0-point F1 improvement and 12% reduction in mean absolute error (MAE) for survival time prediction. Moreover, it attains state-of-the-art scores on BLEU, ROUGE, and BERTScore for rationale generation—demonstrating unprecedented synergy between prognostic accuracy and clinical interpretability.

Technology Category

Application Category

📝 Abstract
Predicting cancer treatment outcomes requires models that are both accurate and interpretable, particularly in the presence of heterogeneous clinical data. While large language models (LLMs) have shown strong performance in biomedical NLP, they often lack structured reasoning capabilities critical for high-stakes decision support. We present a unified, multi-task learning framework that aligns autoregressive LLMs with clinical reasoning for outcome prediction on the MSK-CHORD dataset. Our models are trained to jointly perform binary survival classification, continuous survival time regression, and natural language rationale generation. We evaluate three alignment strategies: (1) standard supervised fine-tuning (SFT), (2) SFT with Chain-of-Thought (CoT) prompting to elicit step-by-step reasoning, and (3) Group Relative Policy Optimization (GRPO), a reinforcement learning method that aligns model outputs to expert-derived reasoning trajectories. Experiments with LLaMa3-8B and Med42-8B backbones demonstrate that CoT prompting improves F1 by +6.0 and reduces MAE by 12%, while GRPO achieves state-of-the-art interpretability and predictive performance across BLEU, ROUGE, and BERTScore. We further show that existing biomedical LLMs often fail to produce valid reasoning traces due to architectural constraints. Our findings underscore the importance of reasoning-aware alignment in multi-task clinical modeling and set a new benchmark for interpretable, trustworthy LLMs in precision oncology.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' structured reasoning for cancer survival prediction
Improving interpretability and accuracy with clinical data alignment
Developing multi-task models for classification, regression and rationale generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning framework for clinical reasoning alignment
Chain-of-Thought prompting improves survival prediction accuracy
GRPO reinforcement learning enhances interpretability and performance
🔎 Similar Papers
No similar papers found.
Raghu Vamshi Hemadri
Raghu Vamshi Hemadri
New York University
Reinforcement LearningNatural Language ProcessingAI Agents
G
Geetha Krishna Guruju
New York University, Tandon School of Engineering
K
Kristi Topollai
New York University, Tandon School of Engineering
A
Anna Ewa Choromanska
New York University, Tandon School of Engineering