🤖 AI Summary
Existing evaluation benchmarks primarily focus on coarse-grained visual questions, making them inadequate for assessing hallucination in multimodal large language models (MLLMs) under fine-grained negative queries. This work presents the first systematic investigation of this issue and introduces FINER, a fine-grained negative query framework, along with two new benchmarks—FINER-CompreCap and FINER-DOCCI—that encompass multi-object, multi-attribute, multi-relation, and “what”-type questions. Building on this framework, the authors propose FINER-Tuning, a fine-tuning approach that integrates Direct Preference Optimization (DPO) with carefully curated negative data. Evaluated across four state-of-the-art MLLMs, FINER-Tuning reduces hallucination errors by up to 24.2% (on InternVL3.5-14B) and consistently improves performance on eight hallucination-focused benchmarks as well as six general-purpose multimodal evaluation suites.
📝 Abstract
Multimodal large language models (MLLMs) struggle with hallucinations, particularly with fine-grained queries, a challenge underrepresented by existing benchmarks that focus on coarse image-related questions. We introduce FIne-grained NEgative queRies (FINER), alongside two benchmarks: FINER-CompreCap and FINER-DOCCI. Using FINER, we analyze hallucinations across four settings: multi-object, multi-attribute, multi-relation, and ``what'' questions. Our benchmarks reveal that MLLMs hallucinate when fine-grained mismatches co-occur with genuinely present elements in the image. To address this, we propose FINER-Tuning, leveraging Direct Preference Optimization (DPO) on FINER-inspired data. Finetuning four frontier MLLMs with FINER-Tuning yields up to 24.2\% gains (InternVL3.5-14B) on hallucinations from our benchmarks, while simultaneously improving performance on eight existing hallucination suites, and enhancing general multimodal capabilities across six benchmarks. Code, benchmark, and models are available at \href{https://explainableml.github.io/finer-project/}{https://explainableml.github.io/finer-project/}.