Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical vision-language models struggle with multi-view reasoning, extensive disease categories, and high image heterogeneity in fetal ultrasound analysis. To address these challenges, we propose FetalSigma—a clinically aligned vision-language model featuring a novel saliency-aware cognitive disentanglement mechanism. Specifically, we design an expert-prior bipartite graph structure and employ reinforcement learning to guide the model in disentangling view–disease associations along clinically grounded pathways, thereby enhancing interpretability and stability. Trained on our large-scale, self-collected dataset FetalSigma-1M, FetalSigma achieves state-of-the-art performance across full gestational age ranges, outperforming both open- and closed-source baselines. It delivers an average 14% improvement in overall metrics and a 61.2% gain in diagnostic accuracy for critical fetal conditions, while maintaining computational efficiency, robustness to domain shifts, and seamless clinical scalability.

Technology Category

Application Category

📝 Abstract
Recent medical vision-language models have shown promise on tasks such as VQA, report generation, and anomaly detection. However, most are adapted to structured adult imaging and underperform in fetal ultrasound, which poses challenges of multi-view image reasoning, numerous diseases, and image diversity. To bridge this gap, we introduce FetalMind, a medical AI system tailored to fetal ultrasound for both report generation and diagnosis. Guided by clinical workflow, we propose Salient Epistemic Disentanglement (SED), which injects an expert-curated bipartite graph into the model to decouple view-disease associations and to steer preference selection along clinically faithful steps via reinforcement learning. This design mitigates variability across diseases and heterogeneity across views, reducing learning bottlenecks while aligning the model's inference with obstetric practice. To train FetalMind at scale, we curate FetalSigma-1M dataset, the first large-scale fetal ultrasound report corpus, comprising 20K reports from twelve medical centers, addressing the scarcity of domain data. Extensive experiments show that FetalMind outperforms open- and closed-source baselines across all gestational stages, achieving +14% average gains and +61.2% higher accuracy on critical conditions while remaining efficient, stable, and scalable. Project Page: https://hexiao0275.github.io/FetalMind.
Problem

Research questions and friction points this paper is trying to address.

Addressing fetal ultrasound challenges with multi-view reasoning and diverse diseases
Developing specialized AI for fetal ultrasound report generation and diagnosis
Overcoming domain data scarcity through large-scale fetal ultrasound corpus creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Salient Epistemic Disentanglement injects expert-curated bipartite graph
Reinforcement learning steers preference selection along clinical steps
FetalSigma-1M dataset addresses fetal ultrasound data scarcity
🔎 Similar Papers
No similar papers found.
X
Xiao He
National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University
Huangxuan Zhao
Huangxuan Zhao
Institute of Artificial Intelligence, School of Computer Science, Wuhan University
generative AIdeep learningmedical imaging
Guojia Wan
Guojia Wan
Wuhan University
Knowledge GraphGraph for ScienceGNN
W
Wei Zhou
National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University
Yanxing Liu
Yanxing Liu
University of Chinese Academy of Sciences
Multimodal perceptionRemote Sensing Object DetectionFew-shot learning
J
Juhua Liu
National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University
Y
Yongchao Xu
National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University
Yong Luo
Yong Luo
Wuhan University
Artifical IntelligenceMachine LearningData MiningPattern Classification and Search
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
Bo Du
Bo Du
Department of Management, Griffith Business School
Sustainable TransportTravel BehaviourUrban Data AnalyticsLogistics and Supply Chain