Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

πŸ“… 2026-04-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

165K/year
πŸ€– AI Summary
Existing methods for medical image report generation struggle to achieve fine-grained spatial alignment between lesion locations and textual descriptions and lack effective means to evaluate spatial grounding capabilities. This work proposes a plug-and-play Discriminative Cue–Prompted Generation framework with Prompt Dropout (DCP-PD), which extracts discriminative cues from free-text reports to guide 3D CT report generation and incorporates a prompt dropout mechanism to prevent the model from relying on superficial shortcuts. The study further introduces, for the first time, a hierarchical, position-aware question-set protocol to directly assess pathology-to-location grounding ability. On the CT-RATE benchmark, the method achieves a macro F1 score of 0.603, representing a 20% relative improvement, and demonstrates substantial generalization gains on out-of-domain Rad-ChestCT data, where F1 rises from 0.266 to 0.503β€”a 89% relative increase.

Technology Category

Application Category

πŸ“ Abstract
Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE, improving macro F1 from $=0.501$ to $0.603$ (20% relative), and substantially boosts out-of-distribution performance on Rad-ChestCT from F1 $=0.266$ to $0.503$ (89% relative). Finally, we introduce a hierarchical, location-aware question-set protocol (presence $\rightarrow$ laterality $\rightarrow$ lobe) to directly assess pathology-location grounding, showing that fine-grained spatial localization remains challenging even for models that score highly on current benchmarks.
Problem

Research questions and friction points this paper is trying to address.

spatial grounding
radiology report generation
fine-grained alignment
3D CT
pathology localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discriminative Cue-Prompting
Prompt Dropout
Spatial Grounding
Fine-Grained Localization
CT Report Generation