M3TQA: Massively Multilingual Multitask Table Question Answering

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multilingual table understanding research is heavily skewed toward English, with a critical lack of high-quality benchmarks for low-resource languages. To address the geographic and scale imbalances in language coverage, we introduce m3TQA-Instruct—the first large-scale, multitask table question answering benchmark spanning 97 languages. Our method features a high-fidelity, six-step LLM-based translation pipeline leveraging DeepSeek and GPT-4o, integrated with back-translation validation and human verification to ensure cross-lingual data quality; it supports four complex table reasoning task types. Experiments demonstrate that unsupervised QA data synthesized from this benchmark significantly enhances large language models’ cross-lingual performance—particularly for low-resource languages—achieving a median BLEU score of 60.19.

Technology Category

Application Category

📝 Abstract
Tabular data is a fundamental component of real-world information systems, yet most research in table understanding remains confined to English, leaving multilingual comprehension significantly underexplored. Existing multilingual table benchmarks suffer from geolinguistic imbalance - overrepresenting certain languages and lacking sufficient scale for rigorous cross-lingual analysis. To address these limitations, we introduce a comprehensive framework for massively multilingual multitask table question answering, featuring m3TQA-Instruct, a large-scale benchmark spanning 97 languages across diverse language families, including underrepresented and low-resource languages. We construct m3TQA by curating 50 real-world tables in Chinese and English, then applying a robust six-step LLM-based translation pipeline powered by DeepSeek and GPT-4o, achieving high translation fidelity with a median BLEU score of 60.19 as validated through back-translation. The benchmark includes 2,916 professionally annotated question-answering pairs across four tasks designed to evaluate nuanced table reasoning capabilities. Experiments on state-of-the-art LLMs reveal critical insights into cross-lingual generalization, demonstrating that synthetically generated, unannotated QA data can significantly boost performance, particularly for low-resource languages. M3T-Bench establishes a new standard for multilingual table understanding, providing both a challenging evaluation platform and a scalable methodology for future research.
Problem

Research questions and friction points this paper is trying to address.

Addressing geolinguistic imbalance in multilingual table benchmarks
Developing a comprehensive framework for multilingual table question answering
Evaluating cross-lingual generalization in table reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual table benchmark spanning 97 languages
LLM-based translation pipeline using DeepSeek and GPT-4o
Synthetic unannotated QA data boosts low-resource performance
🔎 Similar Papers
No similar papers found.
D
Daixin Shu
CCSE, Beihang University
J
Jian Yang
CCSE, Beihang University
Z
Zhenhe Wu
CCSE, Beihang University
X
Xianjie Wu
CCSE, Beihang University
X
Xianfu Cheng
CCSE, Beihang University
X
Xiangyuan Guan
CCSE, Beihang University
Y
Yanghai Wang
CCSE, Beihang University
P
Pengfei Wu
Nanjing University
T
Tingyang Yang
CCSE, Beihang University
H
Hualei Zhu
CCSE, Beihang University
W
Wei Zhang
CCSE, Beihang University
G
Ge Zhang
M-A-P
J
Jiaheng Liu
Nanjing University
Zhoujun Li
Zhoujun Li
Beihang University
Artificial IntelligentNatural Language ProcessingNetwork Security