🤖 AI Summary
Modeling multi-step mathematical word problems (MWPs) in low-resource languages like Bengali is challenging due to data scarcity and complex reasoning requirements. Method: We introduce SOMADHAN, the first high-quality, human-annotated Bengali MWP dataset comprising 8,792 problems, and propose a collaborative optimization framework tailored for low-resource settings—integrating Bengali-adapted chain-of-thought (CoT) prompting with LoRA-based efficient fine-tuning. Contribution/Results: Extensive evaluation across GPT-4o, LLaMA-3.3 (70B), and Qwen demonstrates that LLaMA-3.3 70B achieves 88% accuracy under few-shot CoT, substantially outperforming baselines. This work establishes the first systematic benchmark and methodological advancement for multi-step mathematical reasoning in Bengali, providing critical infrastructure—including a curated dataset and a reproducible low-resource adaptation paradigm—for mathematical NLP research in under-resourced languages.
📝 Abstract
Solving Bengali Math Word Problems (MWPs) remains a major challenge in natural language processing (NLP) due to the language's low-resource status and the multi-step reasoning required. Existing models struggle with complex Bengali MWPs, largely because no human-annotated Bengali dataset has previously addressed this task. This gap has limited progress in Bengali mathematical reasoning. To address this, we created SOMADHAN, a dataset of 8792 complex Bengali MWPs with manually written, step-by-step solutions. We designed this dataset to support reasoning-focused evaluation and model development in a linguistically underrepresented context. Using SOMADHAN, we evaluated a range of large language models (LLMs) - including GPT-4o, GPT-3.5 Turbo, LLaMA series models, Deepseek, and Qwen - through both zero-shot and few-shot prompting with and without Chain of Thought (CoT) reasoning. CoT prompting consistently improved performance over standard prompting, especially in tasks requiring multi-step logic. LLaMA-3.3 70B achieved the highest accuracy of 88% with few-shot CoT prompting. We also applied Low-Rank Adaptation (LoRA) to fine-tune models efficiently, enabling them to adapt to Bengali MWPs with minimal computational cost. Our work fills a critical gap in Bengali NLP by providing a high-quality reasoning dataset and a scalable framework for solving complex MWPs. We aim to advance equitable research in low-resource languages and enhance reasoning capabilities in educational and language technologies.