From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study addresses the challenge that large language models often produce correct diagnostic conclusions through opaque or flawed reasoning, failing to meet the high standards of interpretability and reliability required in clinical decision-making. To tackle this issue, the authors introduce the Toulmin model of argumentation into clinical diagnosis for the first time and propose a Curriculum Goal-Conditioned Learning (CGCL) framework. This approach employs a three-stage progressive training strategy to guide models in constructing structured, verifiable diagnostic arguments. Integrated with the T-Eval evaluation framework, the method achieves diagnostic accuracy and reasoning quality comparable to reinforcement learning baselines while significantly enhancing reasoning transparency, reliability, and training stability.

Technology Category

Application Category

📝 Abstract

The integration of Large Language Models (LLMs) into clinical decision support is critically obstructed by their opaque and often unreliable reasoning. In the high-stakes domain of healthcare, correct answers alone are insufficient; clinical practice demands full transparency to ensure patient safety and enable professional accountability. A pervasive and dangerous weakness of current LLMs is their tendency to produce "correct answers through flawed reasoning." This issue is far more than a minor academic flaw; such process errors signal a fundamental lack of robust understanding, making the model prone to broader hallucinations and unpredictable failures when faced with real-world clinical complexity. In this paper, we establish a framework for trustworthy clinical argumentation by adapting the Toulmin model to the diagnostic process. We propose a novel training pipeline: Curriculum Goal-Conditioned Learning (CGCL), designed to progressively train LLM to generate diagnostic arguments that explicitly follow this Toulmin structure. CGCL's progressive three-stage curriculum systematically builds a solid clinical argument: (1) extracting facts and generating differential diagnoses; (2) justifying a core hypothesis while rebutting alternatives; and (3) synthesizing the analysis into a final, qualified conclusion. We validate CGCL using T-Eval, a quantitative framework measuring the integrity of the diagnosis reasoning. Experiments show that our method achieves diagnostic accuracy and reasoning quality comparable to resource-intensive Reinforcement Learning (RL) methods, while offering a more stable and efficient training pipeline.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

clinical reasoning

trustworthy AI

diagnostic transparency

reasoning reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Toulmin model

Curriculum Goal-Conditioned Learning

clinical diagnostic reasoning