DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prevalent semantic noise in labels and the scarcity of high-quality annotations—leading to poor model robustness in medical visual question answering (Med-VQA)—this paper introduces the first Med-VQA noisy-label benchmark and proposes DiN, a diffusion-based framework. DiN innovatively adapts the diffusion generative paradigm to VQA: it employs an Answer Diffuser for coarse-to-fine answer generation, integrates conditional information guidance, and incorporates a Noisy Label Refinement module for dynamic label correction. The method jointly leverages multimodal feature fusion, conditional embedding-based generation, and a robust loss function. Extensive experiments demonstrate significant improvements in noise robustness: DiN achieves state-of-the-art performance across multiple Med-VQA datasets, with an average accuracy gain of 7.2% over prior methods and exceptional stability under high-noise conditions.

Technology Category

Application Category

📝 Abstract
Medical Visual Question Answering (Med-VQA) systems benefit the interpretation of medical images containing critical clinical information. However, the challenge of noisy labels and limited high-quality datasets remains underexplored. To address this, we establish the first benchmark for noisy labels in Med-VQA by simulating human mislabeling with semantically designed noise types. More importantly, we introduce the DiN framework, which leverages a diffusion model to handle noisy labels in Med-VQA. Unlike the dominant classification-based VQA approaches that directly predict answers, our Answer Diffuser (AD) module employs a coarse-to-fine process, refining answer candidates with a diffusion model for improved accuracy. The Answer Condition Generator (ACG) further enhances this process by generating task-specific conditional information via integrating answer embeddings with fused image-question features. To address label noise, our Noisy Label Refinement(NLR) module introduces a robust loss function and dynamic answer adjustment to further boost the performance of the AD module.
Problem

Research questions and friction points this paper is trying to address.

Address noisy labels in Medical VQA systems
Improve accuracy with diffusion-based answer refinement
Enhance robustness against semantic label noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion model for noisy label handling
Coarse-to-fine answer refinement process
Dynamic noisy label adjustment mechanism
🔎 Similar Papers
No similar papers found.