🤖 AI Summary
This work addresses the challenge of localizing and quantifying implicit societal biases (e.g., gender, racial biases) in multimodal vision-language models. We propose the first vision-grounded bias detection and mitigation framework. Our method enables pixel-level bias analysis—elevating bias assessment from the text level to fine-grained image regions—by explicitly aligning visual patches with textual attributes. It integrates contrastive vision-language modeling, interpretable attention analysis, causal intervention, and adversarial debiasing training to enable automatic bias discovery, attribution localization, and dynamic mitigation. Evaluated across multiple benchmarks, our approach reduces bias metrics by an average of 42% while preserving downstream task accuracy. Key contributions include: (1) the first pixel-level bias localization mechanism; (2) an end-to-end interpretable attribution framework; and (3) a lightweight, annotation-free debiasing strategy that requires no labeled bias data.