Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Existing methods for medical image report generation struggle to achieve fine-grained spatial alignment between lesion locations and textual descriptions and lack effective means to evaluate spatial grounding capabilities. This work proposes a plug-and-play Discriminative Cue–Prompted Generation framework with Prompt Dropout (DCP-PD), which extracts discriminative cues from free-text reports to guide 3D CT report generation and incorporates a prompt dropout mechanism to prevent the model from relying on superficial shortcuts. The study further introduces, for the first time, a hierarchical, position-aware question-set protocol to directly assess pathology-to-location grounding ability. On the CT-RATE benchmark, the method achieves a macro F1 score of 0.603, representing a 20% relative improvement, and demonstrates substantial generalization gains on out-of-domain Rad-ChestCT data, where F1 rises from 0.266 to 0.503—a 89% relative increase.

Technology Category

Application Category

📝 Abstract

Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE, improving macro F1 from $=0.501$ to $0.603$ (20% relative), and substantially boosts out-of-distribution performance on Rad-ChestCT from F1 $=0.266$ to $0.503$ (89% relative). Finally, we introduce a hierarchical, location-aware question-set protocol (presence $\rightarrow$ laterality $\rightarrow$ lobe) to directly assess pathology-location grounding, showing that fine-grained spatial localization remains challenging even for models that score highly on current benchmarks.

Problem

Research questions and friction points this paper is trying to address.

spatial grounding

radiology report generation

fine-grained alignment

3D CT

pathology localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Discriminative Cue-Prompting

Prompt Dropout

Spatial Grounding