Learning Diagnostic Reasoning for Decision Support in Toxicology

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Acute polysubstance poisoning presents significant diagnostic challenges due to nonspecific symptoms and incomplete clinical information, necessitating the integration of unstructured on-scene narratives with structured vital sign data to improve diagnostic accuracy. This work proposes DeToxR, the first system to incorporate reinforcement learning into emergency toxicology decision support. It fine-tunes a large language model using Group Relative Policy Optimization (GRPO) and introduces a novel reward mechanism centered on multi-label consistency to directly optimize clinically relevant performance metrics. The approach effectively mitigates the model’s tendency to omit co-ingested substances or generate spurious predictions. Evaluated on a 14-class multi-label toxic substance classification task, DeToxR significantly outperforms both supervised baselines and the original model, achieving a clinically validated Micro-F1 score of 0.644—surpassing that of expert toxicologists (0.473)—and demonstrating strong potential for real-world clinical deployment.
📝 Abstract
Acute poly-substance intoxication requires rapid, life-saving decisions under substantial uncertainty, as clinicians must rely on incomplete ingestion details and nonspecific symptoms. Effective diagnostic reasoning in this chaotic environment requires fusing unstructured, non-medical narratives (e.g. paramedic scene descriptions and unreliable patient self-reports or known histories), with structured medical data like vital signs. While Large Language Models (LLMs) show potential for processing such heterogeneous inputs, they struggle in this setting, often underperforming simple baselines that rely solely on patient histories. To address this, we present DeToxR (Decision-support for Toxicology with Reasoning), the first adaptation of Reinforcement Learning (RL) to emergency toxicology. We design a robust data-fusion engine for multi-label prediction across 14 substance classes based on an LLM finetuned with Group Relative Policy Optimization (GRPO). We optimize the model's reasoning directly using a clinical performance reward. By formulating a multi-label agreement metric as the reward signal, the model is explicitly penalized for missing co-ingested substances and hallucinating absent poisons. Our model significantly outperforms its unadapted base LLM counterpart and supervised baselines. Furthermore, in a clinical validation study, the model indicates a clinical advantage by outperforming an expert toxicologist in identifying the correct poisons (Micro-F1: 0.644 vs. 0.473). These results demonstrate the potential of RL-aligned LLMs to synthesize unstructured pre-clinical narratives and structured medical data for decision support in high-stakes environments.
Problem

Research questions and friction points this paper is trying to address.

toxicology
diagnostic reasoning
poly-substance intoxication
decision support
data fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Large Language Models
Data Fusion
Multi-label Prediction
Clinical Decision Support
N
Nico Oberländer
Computer Aided Medical Procedures, Technical University of Munich, Germany
David Bani-Harouni
David Bani-Harouni
Technical University of Munich
T
Tobias Zellner
Department of Clinical Toxicology and Poison Control Center Munich, TUM Klinikum rechts der Isar, Germany
Nassir Navab
Nassir Navab
Professor of Computer Science, Technische Universität München
Florian Eyer
Florian Eyer
Professor für Klinische Toxikologie, Klinikum rechts der Isar, Technische Universität München
Toxikologie
Matthias Keicher
Matthias Keicher
Technische Universität München