🤖 AI Summary
Existing hallucination research focuses narrowly on either cross-lingual or cross-modal dimensions in isolation, lacking systematic investigation of their joint occurrence. Method: We introduce CCHall, the first benchmark for joint cross-lingual and cross-modal hallucination detection—explicitly defining and evaluating hallucinations in large language models (LLMs) under multilingual-multimodal mixed inputs. Built upon adversarial test sets derived from multilingual text–image pairs, CCHall integrates human verification with automated metrics to establish a rigorous evaluation protocol. Results: Experiments across leading open- and closed-source LLMs reveal significantly elevated hallucination rates and severely limited generalization in this joint setting. CCHall bridges a critical gap in multidimensional hallucination assessment, providing both a foundational benchmark and an analytical framework to enhance the robustness of multilingual multimodal models.
📝 Abstract
Investigating hallucination issues in large language models (LLMs) within cross-lingual and cross-modal scenarios can greatly advance the large-scale deployment in real-world applications. Nevertheless, the current studies are limited to a single scenario, either cross-lingual or cross-modal, leaving a gap in the exploration of hallucinations in the joint cross-lingual and cross-modal scenarios. Motivated by this, we introduce a novel joint Cross-lingual and Cross-modal Hallucinations benchmark (CCHall) to fill this gap. Specifically, CCHall simultaneously incorporates both cross-lingual and cross-modal hallucination scenarios, which can be used to assess the cross-lingual and cross-modal capabilities of LLMs. Furthermore, we conduct a comprehensive evaluation on CCHall, exploring both mainstream open-source and closed-source LLMs. The experimental results highlight that current LLMs still struggle with CCHall. We hope CCHall can serve as a valuable resource to assess LLMs in joint cross-lingual and cross-modal scenarios.