DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels

📅 2025-03-24

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

To address the prevalent semantic noise in labels and the scarcity of high-quality annotations—leading to poor model robustness in medical visual question answering (Med-VQA)—this paper introduces the first Med-VQA noisy-label benchmark and proposes DiN, a diffusion-based framework. DiN innovatively adapts the diffusion generative paradigm to VQA: it employs an Answer Diffuser for coarse-to-fine answer generation, integrates conditional information guidance, and incorporates a Noisy Label Refinement module for dynamic label correction. The method jointly leverages multimodal feature fusion, conditional embedding-based generation, and a robust loss function. Extensive experiments demonstrate significant improvements in noise robustness: DiN achieves state-of-the-art performance across multiple Med-VQA datasets, with an average accuracy gain of 7.2% over prior methods and exceptional stability under high-noise conditions.

Technology Category

Application Category

📝 Abstract

Medical Visual Question Answering (Med-VQA) systems benefit the interpretation of medical images containing critical clinical information. However, the challenge of noisy labels and limited high-quality datasets remains underexplored. To address this, we establish the first benchmark for noisy labels in Med-VQA by simulating human mislabeling with semantically designed noise types. More importantly, we introduce the DiN framework, which leverages a diffusion model to handle noisy labels in Med-VQA. Unlike the dominant classification-based VQA approaches that directly predict answers, our Answer Diffuser (AD) module employs a coarse-to-fine process, refining answer candidates with a diffusion model for improved accuracy. The Answer Condition Generator (ACG) further enhances this process by generating task-specific conditional information via integrating answer embeddings with fused image-question features. To address label noise, our Noisy Label Refinement(NLR) module introduces a robust loss function and dynamic answer adjustment to further boost the performance of the AD module.

Problem

Research questions and friction points this paper is trying to address.

Address noisy labels in Medical VQA systems

Improve accuracy with diffusion-based answer refinement

Enhance robustness against semantic label noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion model for noisy label handling

Coarse-to-fine answer refinement process

Dynamic noisy label adjustment mechanism

🔎 Similar Papers

No similar papers found.