Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modeling multi-step mathematical word problems (MWPs) in low-resource languages like Bengali is challenging due to data scarcity and complex reasoning requirements. Method: We introduce SOMADHAN, the first high-quality, human-annotated Bengali MWP dataset comprising 8,792 problems, and propose a collaborative optimization framework tailored for low-resource settings—integrating Bengali-adapted chain-of-thought (CoT) prompting with LoRA-based efficient fine-tuning. Contribution/Results: Extensive evaluation across GPT-4o, LLaMA-3.3 (70B), and Qwen demonstrates that LLaMA-3.3 70B achieves 88% accuracy under few-shot CoT, substantially outperforming baselines. This work establishes the first systematic benchmark and methodological advancement for multi-step mathematical reasoning in Bengali, providing critical infrastructure—including a curated dataset and a reproducible low-resource adaptation paradigm—for mathematical NLP research in under-resourced languages.

Technology Category

Application Category

📝 Abstract
Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP) due to the language's low-resource status and the multi-step reasoning required. Existing models struggle with complex Bengali MWPs, largely because no human-annotated Bengali dataset has previously addressed this task. This gap has limited progress in Bengali mathematical reasoning. To address this, we created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions. We designed this dataset to support reasoning-focused evaluation and model development in a linguistically underrepresented context. Using SOMADHAN, we evaluated a range of large language models (LLMs) - including GPT-4o, GPT-3.5 Turbo, LLaMA series models, Deepseek, and Qwen - through both zero-shot and few-shot prompting with and without Chain of Thought (CoT) reasoning. CoT prompting consistently improved performance over standard prompting, especially in tasks requiring multi-step logic. LLaMA-3.3 70B achieved the highest accuracy of 88% with few-shot CoT prompting. We also applied Low-Rank Adaptation (LoRA) to fine-tune models efficiently, enabling them to adapt to Bengali MWPs with minimal computational cost. Our work fills a critical gap in Bengali NLP by providing a high-quality reasoning dataset and a scalable framework for solving complex MWPs. We aim to advance equitable research in low-resource languages and enhance reasoning capabilities in educational and language technologies.
Problem

Research questions and friction points this paper is trying to address.

Solving Bengali math word problems with limited resources
Creating a dataset for Bengali multi-step reasoning tasks
Evaluating LLMs with Chain of Thought for Bengali MWPs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created SOMADHAN dataset for Bengali MWPs
Used Chain of Thought reasoning with LLMs
Applied LoRA for efficient model fine-tuning
🔎 Similar Papers
No similar papers found.
Bidyarthi Paul
Bidyarthi Paul
Adjunct Lecturer, Southeast University
Natural Language ProcessingLLMGenAi
J
Jalisha Jashim Era
Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Tejgaon, Dhaka
M
Mirazur Rahman Zim
Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Tejgaon, Dhaka
T
Tahmid Sattar Aothoi
Department of Computer Science and Engineering, Ahsanullah University of Science and Technology, Tejgaon, Dhaka
Faisal Muhammad Shah
Faisal Muhammad Shah
Associate Professor, Dept of Computer Science and Engineering, Ahsanullah University of Science and
Deep LearningNLPComputer Vision