🤖 AI Summary
Multimodal large language models (MLLMs) suffer from severe object hallucination, while existing mitigation methods incur high computational overhead and suffer from distribution mismatch. Method: This paper proposes a sentence-level early-intervention framework, grounded in the novel observation that hallucinations predominantly occur during the initial stages of text generation. We introduce a domain-specific preference learning mechanism requiring no human annotation: an open-vocabulary detector cross-validates object existence to automatically construct high-quality preference data through iterative refinement; additionally, we propose a context-aware Direct Preference Optimization loss (C-DPO) to enable fine-grained discrimination and optimization between hallucinated and non-hallucinated sentences. Contribution/Results: Experiments demonstrate over 90% reduction in hallucination rates across major hallucination benchmarks, with no degradation—indeed, improvement—in general-purpose capabilities and strong cross-dataset generalization. Code, models, and data are publicly released.
📝 Abstract
Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations - fabricated content contradicting visual inputs. Existing hallucination mitigation methods either incur prohibitive computational costs or introduce distribution mismatches between training data and model outputs. We identify a critical insight: hallucinations predominantly emerge at the early stages of text generation and propagate through subsequent outputs. To address this, we propose **SENTINEL** (**S**entence-level **E**arly i**N**tervention **T**hrough **IN**-domain pr**E**ference **L**earning), a framework that eliminates dependency on human annotations. Specifically, we first bootstrap high-quality in-domain preference pairs by iteratively sampling model outputs, validating object existence through cross-checking with two open-vocabulary detectors, and classifying sentences into hallucinated/non-hallucinated categories. Subsequently, we use context-coherent positive samples and hallucinated negative samples to build context-aware preference data iteratively. Finally, we train models using a context-aware preference loss (C-DPO) that emphasizes discriminative learning at the sentence level where hallucinations initially manifest. Experimental results show that SENTINEL can reduce hallucinations by over 90% compared to the original model and outperforms the previous state-of-the-art method on both hallucination benchmarks and general capabilities benchmarks, demonstrating its superiority and generalization ability. The models, datasets, and code are available at https://github.com/pspdada/SENTINEL.