Unified Triplet-Level Hallucination Evaluation for Large Vision-Language Models

📅 2024-10-30

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing hallucination evaluation benchmarks for Large Vision-Language Models (LVLMs) predominantly focus on object-level hallucinations, overlooking more subtle yet pervasive relational hallucinations—i.e., erroneous generation of relationships between objects in images. Method: We propose Tri-HE, the first unified evaluation framework targeting (object, relation, object) triplets. It introduces zero-shot triplet extraction, fine-grained hallucination annotation, consistency verification, and structured-response reasoning constraints—enabling training-free hallucination mitigation. Contribution/Results: Tri-HE is the first to systematically reveal that relational hallucinations significantly exceed object hallucinations in frequency and severity. Evaluated on the Tri-HE benchmark, our method outperforms leading open-source LVLMs across all metrics and achieves performance on par with GPT-4V. Both code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Despite the outstanding performance in vision-language reasoning, Large Vision-Language Models (LVLMs) might generate hallucinated contents that do not exist in the given image. Most existing LVLM hallucination benchmarks are constrained to evaluate the object-related hallucinations. However, the potential hallucination on the relations between two objects, i.e., relation hallucination, still lacks investigation. To remedy that, in this paper we design a unified framework to measure object and relation hallucination in LVLMs simultaneously. The core idea of our framework is to conduct hallucination evaluation on (object, relation, object) triplets extracted from LVLMs' responses, and thus, could be easily generalized to different vision-language tasks. Based on our framework, we further introduce Tri-HE, a novel Triplet-level Hallucination Evaluation benchmark which can be used to study both object and relation hallucination at the same time. We conduct comprehensive evaluations on Tri-HE and observe that the relation hallucination issue is even more serious than object hallucination among existing LVLMs, highlighting a previously neglected problem towards reliable LVLMs. Moreover, based on our findings, we design a simple yet effective training-free approach to mitigate hallucinations for LVLMs, with which, we exceed all open-sourced counterparts on Tri-HE, achieving comparable performance with the powerful GPT-4V. Our dataset and code for the reproduction of our experiments are available publicly at https://github.com/wujunjie1998/Tri-HE.

Problem

Research questions and friction points this paper is trying to address.

Evaluating object and relation hallucinations in LVLMs

Addressing lack of relation hallucination benchmarks

Mitigating hallucinations in LVLMs via training-free approach

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified triplet-level framework for LVLM hallucination evaluation

Tri-HE benchmark for object and relation hallucination

Training-free approach to mitigate LVLM hallucinations

🔎 Similar Papers

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification