Simple Radiology VLLM Test-time Scaling with Thought Graph Traversal

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Frozen vision-language large models (VLLMs) exhibit limited reasoning capacity in radiology report generation, hindering their effective deployment without fine-tuning. Method: We propose a training-free test-time scaling approach centered on a lightweight Thought Graph Traversal (TGT) framework. TGT encodes a medical knowledge graph as structured prior to guide model traversal of anatomical and pathological findings in clinically coherent order; it integrates dynamic reasoning budget control with a Chain-of-Thought variant to enable traceable, interpretable, self-correcting inference. Contribution/Results: To our knowledge, this is the first method enabling deep test-time scaling on frozen VLLMs. On standard chest X-ray report generation benchmarks, it significantly outperforms strong prompting baselines while uncovering latent dataset biases. All code, prompt templates, and experimental configurations are fully open-sourced to ensure reproducibility.

Technology Category

Application Category

📝 Abstract
Test-time scaling offers a promising way to improve the reasoning performance of vision-language large models (VLLMs) without additional training. In this paper, we explore a simple but effective approach for applying test-time scaling to radiology report generation. Specifically, we introduce a lightweight Thought Graph Traversal (TGT) framework that guides the model to reason through organ-specific findings in a medically coherent order. This framework integrates structured medical priors into the prompt, enabling deeper and more logical analysis with no changes to the underlying model. To further enhance reasoning depth, we apply a reasoning budget forcing strategy that adjusts the model's inference depth at test time by dynamically extending its generation process. This simple yet powerful combination allows a frozen radiology VLLM to self-correct and generate more accurate, consistent chest X-ray reports. Our method outperforms baseline prompting approaches on standard benchmarks, and also reveals dataset biases through traceable reasoning paths. Code and prompts are open-sourced for reproducibility at https://github.com/glerium/Thought-Graph-Traversal.
Problem

Research questions and friction points this paper is trying to address.

Improve radiology report generation without additional training
Guide reasoning with organ-specific findings in medical order
Enhance reasoning depth via dynamic inference adjustment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Thought Graph Traversal framework
Structured medical priors in prompts
Dynamic reasoning budget forcing strategy
Y
Yue Yao
Shandong University, No. 17923 Jingshi Road, Jinan, 250061, Shandong, China
Z
Zelin Wen
Shandong University, No. 17923 Jingshi Road, Jinan, 250061, Shandong, China
Yan Tong
Yan Tong
Professor of Computer Science and Engineering, University of South Carolina
Computer visionMachine Learning
X
Xinyu Tian
Shandong University, No. 17923 Jingshi Road, Jinan, 250061, Shandong, China
X
Xuqing Li
Shandong University, No. 17923 Jingshi Road, Jinan, 250061, Shandong, China
X
Xiao Ma
Shandong University, No. 17923 Jingshi Road, Jinan, 250061, Shandong, China
D
Dongliang Xu
Curtin University, Kent Street, Perth, 6102, Western Australia, Australia
Tom Gedeon
Tom Gedeon
Human-Centric Advancements Chair in AI, Curtin University
Responsive AINeural / Deep LearningResponsible AIHuman-Centered AIAffective Computing