Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization

๐Ÿ“… 2024-02-18
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address factual inconsistency in multimodal summarization, this paper proposes the first fine-grained, interpretable dual-path factuality evaluation frameworkโ€”jointly supporting reference-dependent supervised evaluation and reference-free open-scenario assessment. Methodologically, it integrates multimodal alignment modeling, cross-modal factual verification, and explainable score decomposition to enable error localization and natural-language explanation generation. Evaluated across multiple benchmarks, the framework substantially outperforms conventional metrics (e.g., BLEU, ROUGE) and achieves a 32% improvement in correlation with human judgments. The code and dataset are publicly released, establishing a new paradigm and practical toolkit for factuality research in multimodal summarization.

Technology Category

Application Category

๐Ÿ“ Abstract
Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factuality of multimodal summarization models, we propose two fine-grained and explainable evaluation frameworks (FALLACIOUS) for different application scenarios, i.e. reference-based factuality evaluation framework and reference-free factuality evaluation framework. Notably, the reference-free factuality evaluation framework doesn't need ground truth and hence it has a wider application scenario. To evaluate the effectiveness of the proposed frameworks, we compute the correlation between our frameworks and the other metrics. The experimental results show the effectiveness of our proposed method. We will release our code and dataset via github.
Problem

Research questions and friction points this paper is trying to address.

Evaluating factuality in multimodal summarization outputs
Addressing unfactual content in text-image summary generation
Providing explainable frameworks for reference-based and reference-free evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained explainable evaluation frameworks FALLACIOUS
Reference-based and reference-free factuality assessment methods
No ground truth required for wider application scenarios
๐Ÿ”Ž Similar Papers
No similar papers found.
Liqiang Jing
Liqiang Jing
University of Texas at Dallas
Multimedia AnalysisMultimodalNatural Language Processing
J
Jingxuan Zuo
The University of Texas at Dallas
Y
Yue Zhang
The University of Texas at Dallas