IRIS: Interleaved Reinforcement with Incremental Staged Curriculum for Cross-Lingual Mathematical Reasoning

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

180K/year
🤖 AI Summary
This work addresses the challenge of inconsistent step-by-step reasoning in multilingual mathematical problem solving—particularly for low-resource Indian languages—caused by cross-lingual transfer. The authors propose a dual-axis training framework: vertically, progressive supervised fine-tuning incrementally increases task difficulty; horizontally, reverse-curriculum reinforcement learning reduces reliance on explicit step-by-step supervision. This approach introduces a novel architecture that integrates progressive curriculum learning with inverse reinforcement learning and employs a composite reward mechanism based on Group Relative Policy Optimization (GRPO), incorporating correctness, step alignment, coherence, and numerical incentives. Evaluated on a newly constructed CL-Math dataset spanning English, Hindi, and Marathi, the method significantly enhances multilingual math reasoning performance, yielding substantial gains in low-resource and bilingual settings while also achieving modest improvements for high-resource languages.

Technology Category

Application Category

📝 Abstract
Curriculum learning helps language models tackle complex reasoning by gradually increasing task difficulty. However, it often fails to generate consistent step-by-step reasoning, especially in multilingual and low-resource settings where cross-lingual transfer from English to Indian languages remains limited. We propose IRIS: Interleaved Reinforcement with Incremental Staged Curriculum, a two-axis framework that combines Supervised Fine-Tuning on progressively harder problems (vertical axis) with Reverse Curriculum Reinforcement Learning to reduce reliance on step-by-step guidance (horizontal axis). We design a composite reward combining correctness, step-wise alignment, continuity, and numeric incentives, optimized via Group Relative Policy Optimization (GRPO). We release CL-Math, a dataset of 29k problems with step-level annotations in English, Hindi, and Marathi. Across standard benchmarks and curated multilingual test sets, IRIS consistently improves performance, with strong results on math reasoning tasks and substantial gains in low-resource and bilingual settings, alongside modest improvements in high-resource languages.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual transfer
mathematical reasoning
low-resource languages
curriculum learning
multilingual reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Learning
Reinforcement Learning
Cross-lingual Reasoning
Mathematical Reasoning
Low-resource Languages
🔎 Similar Papers
No similar papers found.