🤖 AI Summary
Medical report generation (MRG) faces three key challenges: insufficient domain knowledge modeling, misalignment in fine-grained visual–textual entity embeddings, and spurious cross-modal correlations. To address these, we propose a hierarchical task-decomposition framework—the first to jointly integrate domain knowledge understanding, fine-grained vision–language alignment, and causal debiasing within a unified architecture. Methodologically, our approach synergistically combines prefix-based language modeling, masked image modeling, and spatially aware feature alignment, augmented by a front-end gated causal intervention mechanism to enable robust cross-modal causal reasoning. Evaluated on multiple public benchmarks, our model consistently outperforms state-of-the-art methods, achieving significant improvements in both report accuracy and clinical interpretability. Moreover, it markedly reduces reliance on dataset-specific biases, thereby enhancing generalization and robustness.
📝 Abstract
Medical Report Generation (MRG) is a key part of modern medical diagnostics, as it automatically generates reports from radiological images to reduce radiologists'burden. However, reliable MRG models for lesion description face three main challenges: insufficient domain knowledge understanding, poor text-visual entity embedding alignment, and spurious correlations from cross-modal biases. Previous work only addresses single challenges, while this paper tackles all three via a novel hierarchical task decomposition approach, proposing the HTSC-CIF framework. HTSC-CIF classifies the three challenges into low-, mid-, and high-level tasks: 1) Low-level: align medical entity features with spatial locations to enhance domain knowledge for visual encoders; 2) Mid-level: use Prefix Language Modeling (text) and Masked Image Modeling (images) to boost cross-modal alignment via mutual guidance; 3) High-level: a cross-modal causal intervention module (via front-door intervention) to reduce confounders and improve interpretability. Extensive experiments confirm HTSC-CIF's effectiveness, significantly outperforming state-of-the-art (SOTA) MRG methods. Code will be made public upon paper acceptance.