🤖 AI Summary
This paper addresses the challenge of detecting implicit bias in news—such as linguistic framing bias and image-text inconsistency—by proposing ViLBias, a multimodal bias detection framework. Methodologically, it jointly leverages textual and visual cues through coordinated invocation of large language models (LLMs), vision-language models (VLMs), and small language models (SLMs), and introduces a novel hybrid annotation paradigm combining LLM-assisted labeling with human verification. Key contributions include: (1) the first systematic evaluation of SLMs, LLMs, and VLMs for bimodal bias detection, demonstrating LLMs’ superior fine-grained recognition capability over SLMs; (2) empirical validation that image-text joint modeling improves detection accuracy by 3–5%; and (3) release of the first benchmark dataset for multimodal news bias, covering diverse news sources and featuring fine-grained, human-verified annotations. These results provide both a novel methodology and reproducible resources for multimodal bias assessment.
📝 Abstract
The integration of Large Language Models (LLMs) and Vision-Language Models (VLMs) opens new avenues for addressing complex challenges in multimodal content analysis, particularly in biased news detection. This study introduces VLBias, a framework that leverages state-of-the-art LLMs and VLMs to detect linguistic and visual biases in news content. We present a multimodal dataset comprising textual content and corresponding images from diverse news sources. We propose a hybrid annotation framework that combines LLM-based annotations with human review to ensure high-quality labeling while reducing costs and enhancing scalability. Our evaluation compares the performance of state-of-the-art SLMs and LLMs for both modalities (text and images) and the results reveal that while SLMs are computationally efficient, LLMs demonstrate superior accuracy in identifying subtle framing and text-visual inconsistencies. Furthermore, empirical analysis shows that incorporating visual cues alongside textual data improves bias detection accuracy by 3 to 5%. This study provides a comprehensive exploration of LLMs, SLMs, and VLMs as tools for detecting multimodal biases in news content and highlights their respective strengths, limitations, and potential for future applications