Hallucination Localization in Video Captioning

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses fine-grained hallucinations in video captioning by introducing, for the first time, the **segment-level hallucination localization task**, which identifies erroneous words or phrases within captions—moving beyond coarse sentence-level hallucination detection. To support this task, we construct HLVC-Dataset, the first finely annotated benchmark comprising 1,167 video–caption pairs; captions are initially generated by VideoLLMs and rigorously verified and hallucination-annotated by human experts at the token/phrase level. We further design a tailored VideoLLM-based baseline model and conduct comprehensive quantitative and qualitative evaluations. Experiments demonstrate that our approach effectively localizes hallucinated segments, significantly improving error traceability and diagnostic precision. This work establishes a novel task paradigm, provides the first dedicated benchmark dataset, and lays methodological foundations for fine-grained analysis and evaluation of multimodal hallucinations in video understanding.

Technology Category

Application Category

📝 Abstract

We propose a novel task, hallucination localization in video captioning, which aims to identify hallucinations in video captions at the span level (i.e. individual words or phrases). This allows for a more detailed analysis of hallucinations compared to existing sentence-level hallucination detection task. To establish a benchmark for hallucination localization, we construct HLVC-Dataset, a carefully curated dataset created by manually annotating 1,167 video-caption pairs from VideoLLM-generated captions. We further implement a VideoLLM-based baseline method and conduct quantitative and qualitative evaluations to benchmark current performance on hallucination localization.

Problem

Research questions and friction points this paper is trying to address.

Identifying hallucinations in video captions at span level

Creating a dataset to benchmark hallucination localization

Evaluating baseline methods for video caption hallucination detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces span-level hallucination localization in video captioning

Constructs HLVC-Dataset with annotated video-caption pairs

Implements VideoLLM-based baseline method for evaluation

🔎 Similar Papers

No similar papers found.