GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the poor performance of existing large language models in multi-step mathematical reasoning for low-resource languages such as Bengali, where models often rely on English-based reasoning followed by translation, and reinforcement learning struggles to converge under sparse reward conditions. To tackle these challenges, the authors introduce Ganit, the first Bengali mathematical reasoning dataset annotated with automatically assigned difficulty labels, and propose Curriculum-GRPO—a training framework that integrates supervised fine-tuning with Group Relative Policy Optimization. This framework incorporates pass@k-based difficulty estimation, a verifiable reward mechanism, and difficulty-aware curriculum sampling. Experiments show that the resulting model, GanitLLM-4B, achieves accuracy gains of 8.0 and 7.0 percentage points on Bn-MGSM and Bn-MSVAMP, respectively, increases the proportion of Bengali reasoning tokens from 14% to over 88%, and reduces average solution length from 943 to 193 tokens.

Technology Category

Application Category

📝 Abstract

We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics,"Ganit"), together with a new difficulty-aware Bengali math corpus and a curriculum-based GRPO pipeline. Bengali is one of the world's most widely spoken languages, yet existing LLMs either reason in English and then translate, or simply fail on multi-step Bengali math, in part because reinforcement learning recipes are tuned for high-resource languages and collapse under reward sparsity in low-resource settings. To address this, we construct Ganit, a rigorously filtered and decontaminated Bengali math dataset with automatic difficulty tags derived from the pass@k of a strong evaluator model. Building on this dataset, we propose Curriculum-GRPO, which combines multi-stage training (SFT + GRPO) with difficulty-aware sampling and verifiable rewards for format, numerical correctness, and Bengali reasoning. On Bn-MGSM and Bn-MSVAMP, GanitLLM-4B improves over its Qwen3-4B base by +8 and +7 accuracy points, respectively, while increasing the percentage of Bengali reasoning tokens from 14% to over 88% and reducing average solution length from 943 to 193 words.

Problem

Research questions and friction points this paper is trying to address.

Bengali mathematical reasoning

low-resource languages

reward sparsity

multi-step math

language-specific reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-GRPO

difficulty-aware sampling

Bengali mathematical reasoning

verifiable rewards

low-resource LLM

🔎 Similar Papers

Achieving>97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

2024-04-23Citations: 2

Authors to Follow