🤖 AI Summary
This study addresses the inaccurate lesion localization arising from the semantic-spatial gap between radiology report texts and 3D CT images by proposing a graph-guided, lesion-level alignment framework. The method models lesion descriptions as atomic semantic units and employs a relation-aware graph neural network to infer organ affiliation, attributes, and inter-lesion relationships, thereby generating discriminative lesion queries. A region-level proposal verification mechanism, guided by anatomical priors, enforces one-to-one correspondence between textual descriptions and lesions. Furthermore, an octree-based autoregressive strategy progressively refines lesion boundaries in a hierarchical manner. Experiments on AbdomenAtlas 3.0 demonstrate that the proposed approach significantly outperforms existing baselines, achieving consistent improvements in both lesion segmentation accuracy and localization precision.
📝 Abstract
Grounding radiology report descriptions to 3D CT volumes is essential for verifiable clinical interpretation, yet remains challenging due to the semantic-spatial gap between free-text narratives and volumetric anatomy. Existing report-assisted and vision-language grounding methods typically rely on phrase-level alignment or dense pixel supervision, resulting in limited lesion-wise correspondence and suboptimal localization accuracy. We propose GLeVE, a graph-guided lesion grounding framework with anatomical prior verification and octree-based autoregressive refinement. GLeVE treats each lesion description as an atomic semantic unit and encodes organ attribution, attributes, and inter-lesion relations through relation-aware graph reasoning to produce discriminative lesion-wise queries. Anatomy-aware proposal generation with region-level verification enforces one-to-one text-lesion alignment, while hierarchical octree refinement progressively improves boundary delineation. Experiments on AbdomenAtlas 3.0 demonstrate consistent gains over classical multimodal foundation models and report-supervised baselines in both segmentation accuracy and lesion-level localization.