BanglaSummEval: Reference-Free Factual Consistency Evaluation for Bangla Summarization

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes the first reference-free, end-to-end framework for evaluating factual consistency in low-resource languages, specifically targeting Bengali. Built upon a unified multilingual instruction-tuned model, the approach automatically generates questions from source documents and extracts corresponding answers from summaries, then assesses factual accuracy and content coverage by combining question importance weighting with BERTScore-Recall for semantic matching. Evaluated on a dataset of 300 human-written Bengali summaries in education and healthcare domains, the method demonstrates strong correlation with expert judgments (Pearson’s r = 0.694, Spearman’s ρ = 0.763), offering both interpretability and reliability without reliance on reference summaries.

Technology Category

Application Category

📝 Abstract
Evaluating factual consistency is essential for reliable text summarization, particularly in high-stakes domains such as healthcare and news. However, most existing evaluation metrics overlook Bangla, a widely spoken yet under-resourced language, and often depend on reference summaries. We introduce BanglaSummEval, a reference-free, question-answering-based framework for evaluating factual consistency in Bangla summarization. The proposed method assesses both factual accuracy and content coverage through automatically generated questions and answers derived from the source document and the summary. A single multilingual instruction-tuned language model handles question generation, question answering, candidate answer extraction, and question importance weighting. This unified design reduces system complexity and computational cost. To capture semantic consistency beyond surface-level overlap, we use BERTScore-Recall for answer comparison. We validate BanglaSummEval on 300 human-written summaries from educational and medical domains, demonstrating strong correlation with expert human judgments (Pearson's $r = 0.694$, Spearman's $ρ= 0.763$). By providing interpretable, step-wise diagnostics alongside reliable evaluation scores, BanglaSummEval offers a practical and transparent solution for factual consistency evaluation in low-resource language settings.
Problem

Research questions and friction points this paper is trying to address.

factual consistency
Bangla summarization
reference-free evaluation
low-resource language
text summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

reference-free evaluation
factual consistency
question-answering framework
low-resource language
multilingual instruction-tuned model
🔎 Similar Papers
No similar papers found.
A
Ahmed Rafid
Department of Computer Science and Engineering, Islamic University of Technology, Bangladesh
R
Rumman Adib
Department of Computer Science and Engineering, Islamic University of Technology, Bangladesh
F
Fariya Ahmed
Department of Computer Science and Engineering, Islamic University of Technology, Bangladesh
Ajwad Abrar
Ajwad Abrar
Junior Lecturer, IUT
Natural Language ProcessingHuman Computer InteractionSoftware Engineering
Mohammed Saidul Islam
Mohammed Saidul Islam
Lecturer, CSE, Islamic University of Technology
Natural Language ProcessingComputer Vision. Machine Learning