🤖 AI Summary
Medical report generation (MRG) typically models reports as flat sequences, ignoring radiologists’ structured diagnostic workflow—“findings → impression → conclusion”—leading to logical disconnection between descriptive content and diagnostic reasoning, and poor clinical coherence. To address this, we propose the first hierarchical reinforcement fine-tuning framework explicitly aligned with clinical workflow. Our method introduces a tiered reward mechanism distinguishing findings-level, impression-level, and global consistency rewards; incorporates critical-case-aware optimization and cross-section semantic constraints; and enhances accuracy of key statements via local sensitivity modeling. Evaluated on chest X-ray and carotid ultrasound datasets, our approach significantly outperforms existing MRG models, achieving state-of-the-art performance in diagnostic coherence, clinical accuracy, and overall report quality.
📝 Abstract
Radiologists compose diagnostic reports through a structured workflow: they describe visual findings, summarize them into impressions, and carefully refine statements in clinically critical cases. However, most existing medical report generation (MRG) systems treat reports as flat sequences, overlooking this hierarchical organization and leading to inconsistencies between descriptive and diagnostic content. To align model behavior with real-world reporting practices, we propose RadFlow, a hierarchical workflow-guided reinforcement optimization framework that explicitly models the structured nature of clinical reporting. RadFlow introduces a clinically grounded reward hierarchy that mirrors the organization of radiological reports. At the global level, the reward integrates linguistic fluency, medical-domain correctness, and cross-sectional consistency between Finding and Impression, promoting coherent and clinically faithful narratives. At the local level, a section-specific reward emphasizes Impression quality, reflecting its central role in diagnostic accuracy. Furthermore, a critical-aware policy optimization mechanism adaptively regularizes learning for high-risk or clinically sensitive cases, emulating the cautious refinement behavior of radiologists when documenting critical findings. Together, these components translate the structured reporting paradigm into the reinforcement fine-tuning process, enabling the model to generate reports that are both linguistically consistent and clinically aligned. Experiments on chest X-ray and carotid ultrasound datasets demonstrate that RadFlow consistently improves diagnostic coherence and overall report quality compared with state-of-the-art baselines.