Motif-2-12.7B-Reasoning: A Practitioner's Guide to RL Training Recipes

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-source large language models (LLMs) underperform closed-source counterparts on complex reasoning and long-context tasks (e.g., 64K tokens), while suffering from training instability and model collapse. Method: We propose a novel training paradigm centered on a 12.7B-parameter open-source LLM, featuring (i) a two-stage curriculum-style supervised fine-tuning (SFT), (ii) difficulty-aware reinforcement learning fine-tuning (RLFT), (iii) validation-aligned synthetic data generation, (iv) policy trajectory reuse, (v) stability-aware filtering, and (vi) hybrid parallelism with kernel-level optimizations for efficient long-context training. Contribution/Results: Our model achieves performance on par with significantly larger closed-source models across mathematical reasoning, code generation, and agent benchmarks—despite its moderate scale—while ensuring training reproducibility and stability. This work delivers a high-performance, scalable, and openly accessible framework for advancing complex reasoning capabilities in the open-source community.

Technology Category

Application Category

📝 Abstract
We introduce Motif-2-12.7B-Reasoning, a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-context understanding. Addressing the common challenges of model collapse and training instability in reasoning adaptation, we propose a comprehensive, reproducible training recipe spanning system, data, and algorithmic optimizations. Our approach combines memory-efficient infrastructure for 64K-token contexts using hybrid parallelism and kernel-level optimizations with a two-stage Supervised Fine-Tuning (SFT) curriculum that mitigates distribution mismatch through verified, aligned synthetic data. Furthermore, we detail a robust Reinforcement Learning Fine-Tuning (RLFT) pipeline that stabilizes training via difficulty-aware data filtering and mixed-policy trajectory reuse. Empirical results demonstrate that Motif-2-12.7B-Reasoning achieves performance comparable to models with significantly larger parameter counts across mathematics, coding, and agentic benchmarks, offering the community a competitive open model and a practical blueprint for scaling reasoning capabilities under realistic compute constraints.
Problem

Research questions and friction points this paper is trying to address.

Bridges gap between open and proprietary reasoning models
Addresses model collapse and training instability challenges
Scales reasoning capabilities under realistic compute constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid parallelism and kernel optimizations for 64K-token contexts
Two-stage SFT curriculum with verified synthetic data
RLFT pipeline with difficulty-aware filtering and trajectory reuse
🔎 Similar Papers
No similar papers found.
J
Junghwan Lim
Motif Technologies
Sungmin Lee
Sungmin Lee
AIX, SK Telecom
Machine LearningComputer Vision
D
Dongseok Kim
Motif Technologies
T
Taehyun Kim
Motif Technologies
E
Eunhwan Park
Motif Technologies
J
Jeesoo Lee
Motif Technologies
J
Jeongdoo Lee
Motif Technologies
Junhyeok Lee
Junhyeok Lee
Johns Hopkins University, Center for Language and Signal Processing
Speech and Language ProcessingSpeech ProcessingSpeech SynthesisGenerative Model
Wai Ting Cheung
Wai Ting Cheung
Motif Technologies
Dahye Choi
Dahye Choi
Motif Technologies
Minsu Ha
Minsu Ha
Motif Technologies
J
Jaeheui Her
Motif Technologies
J
Jaeyeon Huh
Motif Technologies
H
Hanbin Jung
Motif Technologies
C
Changjin Kang
Motif Technologies
B
Beomgyu Kim
Motif Technologies
M
Minjae Kim
Motif Technologies
Taewhan Kim
Taewhan Kim
Seoul National University, Department of Electrical and Computer Engineering
Electronic Design Automation
Y
Youngrok Kim
Motif Technologies
H
Hyukjin Kweon
Motif Technologies
H
Haesol Lee
Motif Technologies
K
Kungyu Lee
Motif Technologies
D
Dongpin Oh
Motif Technologies
Y
Yeongjae Park
Motif Technologies
B
Bokki Ryu
Motif Technologies