IRIS: Interleaved Reinforcement with Incremental Staged Curriculum for Cross-Lingual Mathematical Reasoning

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the challenge of inconsistent step-by-step reasoning in multilingual mathematical problem solving—particularly for low-resource Indian languages—caused by cross-lingual transfer. The authors propose a dual-axis training framework: vertically, progressive supervised fine-tuning incrementally increases task difficulty; horizontally, reverse-curriculum reinforcement learning reduces reliance on explicit step-by-step supervision. This approach introduces a novel architecture that integrates progressive curriculum learning with inverse reinforcement learning and employs a composite reward mechanism based on Group Relative Policy Optimization (GRPO), incorporating correctness, step alignment, coherence, and numerical incentives. Evaluated on a newly constructed CL-Math dataset spanning English, Hindi, and Marathi, the method significantly enhances multilingual math reasoning performance, yielding substantial gains in low-resource and bilingual settings while also achieving modest improvements for high-resource languages.

Technology Category

Application Category

📝 Abstract

Curriculum learning helps language models tackle complex reasoning by gradually increasing task difficulty. However, it often fails to generate consistent step-by-step reasoning, especially in multilingual and low-resource settings where cross-lingual transfer from English to Indian languages remains limited. We propose IRIS: Interleaved Reinforcement with Incremental Staged Curriculum, a two-axis framework that combines Supervised Fine-Tuning on progressively harder problems (vertical axis) with Reverse Curriculum Reinforcement Learning to reduce reliance on step-by-step guidance (horizontal axis). We design a composite reward combining correctness, step-wise alignment, continuity, and numeric incentives, optimized via Group Relative Policy Optimization (GRPO). We release CL-Math, a dataset of 29k problems with step-level annotations in English, Hindi, and Marathi. Across standard benchmarks and curated multilingual test sets, IRIS consistently improves performance, with strong results on math reasoning tasks and substantial gains in low-resource and bilingual settings, alongside modest improvements in high-resource languages.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual transfer

mathematical reasoning

low-resource languages

curriculum learning

multilingual reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Learning

Reinforcement Learning

Cross-lingual Reasoning