Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Bengali Visual Question Answering (VQA) datasets suffer from limited human annotation, monolithic answer formats, poor translation quality, and the absence of high-quality open benchmarks—severely hindering low-resource multimodal research. To address these limitations, we introduce BanglaVQA, the first high-quality, open-source Bengali VQA benchmark, comprising 4,750 images and 52,650 question-answer pairs spanning nominal, quantitative, and yes/no question types. Our methodology employs a multilingual large language model–assisted translation and refinement pipeline, rigorously validated by human experts to ensure semantic fidelity and linguistic naturalness. We further propose a fine-grained answer-type taxonomy to support diverse evaluation protocols. BanglaVQA fills a critical gap in low-resource language VQA benchmarks and establishes a robust foundation for inclusive, multilingual multimodal AI research.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce Bangla-Bayanno, an open-ended Visual Question Answering (VQA) Dataset in Bangla, a widely used, low-resource language in multimodal AI research. The majority of existing datasets are either manually annotated with an emphasis on a specific domain, query type, or answer type or are constrained by niche answer formats. In order to mitigate human-induced errors and guarantee lucidity, we implemented a multilingual LLM-assisted translation refinement pipeline. This dataset overcomes the issues of low-quality translations from multilingual sources. The dataset comprises 52,650 question-answer pairs across 4750+ images. Questions are classified into three distinct answer types: nominal (short descriptive), quantitative (numeric), and polar (yes/no). Bangla-Bayanno provides the most comprehensive open-source, high-quality VQA benchmark in Bangla, aiming to advance research in low-resource multimodal learning and facilitate the development of more inclusive AI systems.
Problem

Research questions and friction points this paper is trying to address.

Creating a high-quality Bengali VQA dataset
Overcoming low-quality multilingual translation issues
Advancing low-resource multimodal AI research
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted translation refinement pipeline
Multilingual sources for dataset creation
Three distinct answer type classifications
🔎 Similar Papers
No similar papers found.
Mohammed Rakibul Hasan
Mohammed Rakibul Hasan
Research Scientist @ SpontAlign | CS @NSU
Responsible AINLPLLMQC
R
Rafi Majid
Department of Electrical and Computer Engineering, North South University, Bashundhara, Dhaka-1229, Bangladesh
A
Ahanaf Tahmid
Department of Electrical and Computer Engineering, North South University, Bashundhara, Dhaka-1229, Bangladesh