RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Current medical imaging AI systems struggle to model the multi-step longitudinal reasoning process radiologists employ in cancer screening, which integrates historical imaging with clinical guidelines. To address this gap, this work proposes RadThinking—the first medical visual question answering dataset hierarchically structured by reasoning depth, encompassing three tiers: basic perception, single-step reasoning, and compositional reasoning. Compositional questions are explicitly linked to atomic perception chains grounded in authoritative guidelines such as LI-RADS. Built upon 20,362 CT scans from 9,131 patients and 2,077 controls, the dataset provides structured, pathologically validated reasoning chains that enable both supervised learning and reinforcement-based training and evaluation of reasoning models. RadThinking thus establishes a comprehensive benchmark for advancing AI from perceptual tasks toward clinically grounded multi-step diagnostic reasoning.

📝 Abstract

Cancer screening is a reasoning task. A radiologist observes findings, compares them to prior scans, integrates clinical context, and reaches a diagnostic conclusion confirmed by pathology. We present RadThinking, a Visual Question Answering (VQA) dataset that makes this reasoning explicit and trainable. RadThinking releases VQA pairs at three difficulty tiers. Foundation VQAs are atomic perception questions. Single-step reasoning VQAs apply one clinical rule. Compositional VQAs require multi-step chain-of-thought to reach a guideline category such as LI-RADS-5. For every compositional VQA, we release the chain of foundation VQAs that solves it. The chain follows the rules of the governing clinical reporting standard. The dataset spans 20,362 CT scans from 9,131 patients across 43 cancer groups, plus 2,077 verified healthy controls with >1-year follow-up. To our knowledge, RadThinking is the first cancer-screening VQA corpus that stratifies questions by reasoning depth and grounds compositions in clinical reporting standards. The foundation tier supplies atomic perception supervision. The compositional tier supplies chain-of-thought data and verifiable rewards for reinforcement-learning recipes such as DeepSeek-R1 and OpenAI o1. RadThinking enables systematic training and evaluation of whether AI systems can reason about cancer, not merely detect it.

Problem

Research questions and friction points this paper is trying to address.

clinical reasoning

radiology

cancer screening

visual question answering

longitudinal analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Question Answering

Clinical Reasoning

Chain-of-Thought