Regulating Anatomy-Aware Rewards via Trajectory-Integral Feedback for Volumetric Computed Tomography Analysis

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the susceptibility of existing medical vision-language models (VLMs) to evaluation hallucinations in 3D CT analysis, which arises from their reliance on textual proxy rewards and leads to optimization objectives misaligned with clinical reality. To mitigate this, the study introduces control theory into medical VLM reinforcement learning for the first time, proposing a Trajectory Integral Feedback (TIF) mechanism that models clinical reasoning as a pseudo-temporal trajectory of anomaly detection. TIF employs an anatomy-aware integral feedback loop to suppress hallucinations and penalize persistent omissions. Building upon this, the authors establish the Clinical Abnormality Benchmark Suite (CABS) to expose mechanistic discrepancies and develop the TIF-GRPO framework for fine-grained policy optimization. Experiments demonstrate that the proposed approach significantly improves anomaly detection accuracy and clinical fidelity on 3D CT benchmarks while effectively reducing critical diagnostic errors.

📝 Abstract

Medical vision-language models (VLMs) have rapidly advanced as general-purpose multimodal assistants, yet their deployment in 3D Computed Tomography (CT) analysis remains constrained by a persistent mismatch between optimization objectives and clinical rigor. Current Reinforcement Learning (RL) paradigms still rely on lexical proxy signals that induce ``\textit{Evaluation Hallucinations}'', where models optimize linguistic fluency rather than factual clinical correctness, leading to diagnostically critical errors. To bridge this gap, we introduce the \textbf{Clinical Abnormality Benchmarking Substrate (CABS)}, a structured system that decomposes radiology reports into verifiable clinical semantic units. Using CABS, we identify a ``\textit{Mechanistic Divergence}'' in standard RL, where surface-similarity rewards drive policy gradients to bypass medical facts. We therefore propose \textbf{Trajectory-Integral Feedback GRPO (TIF-GRPO)}, a novel framework integrating control-theoretic principles into policy optimization. By formulating clinical reasoning as a pseudo-temporal trajectory for anomaly discovery, TIF-GRPO regulates anatomy-aware rewards via an integral feedback loop that penalizes persistent omissions as cumulative state errors and suppresses hallucinations as excessive control effort. Experiments on 3D CT benchmarks demonstrate that our approach significantly enhances abnormality detection and clinical faithfulness, establishing a new paradigm for fine-grained regulation in medical VLMs. Our project is available at \href{https://github.com/ZJU4HealthCare/TIF-GRPO}{GitHub}.

Problem

Research questions and friction points this paper is trying to address.

Medical Vision-Language Models

3D Computed Tomography

Reinforcement Learning

Clinical Faithfulness

Evaluation Hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-Integral Feedback

Anatomy-Aware Rewards

Medical Vision-Language Models