🤖 AI Summary
Open-source large language models (LLMs) underperform closed-source counterparts on complex reasoning and long-context tasks (e.g., 64K tokens), while suffering from training instability and model collapse. Method: We propose a novel training paradigm centered on a 12.7B-parameter open-source LLM, featuring (i) a two-stage curriculum-style supervised fine-tuning (SFT), (ii) difficulty-aware reinforcement learning fine-tuning (RLFT), (iii) validation-aligned synthetic data generation, (iv) policy trajectory reuse, (v) stability-aware filtering, and (vi) hybrid parallelism with kernel-level optimizations for efficient long-context training. Contribution/Results: Our model achieves performance on par with significantly larger closed-source models across mathematical reasoning, code generation, and agent benchmarks—despite its moderate scale—while ensuring training reproducibility and stability. This work delivers a high-performance, scalable, and openly accessible framework for advancing complex reasoning capabilities in the open-source community.
📝 Abstract
We introduce Motif-2-12.7B-Reasoning, a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-context understanding. Addressing the common challenges of model collapse and training instability in reasoning adaptation, we propose a comprehensive, reproducible training recipe spanning system, data, and algorithmic optimizations. Our approach combines memory-efficient infrastructure for 64K-token contexts using hybrid parallelism and kernel-level optimizations with a two-stage Supervised Fine-Tuning (SFT) curriculum that mitigates distribution mismatch through verified, aligned synthetic data. Furthermore, we detail a robust Reinforcement Learning Fine-Tuning (RLFT) pipeline that stabilizes training via difficulty-aware data filtering and mixed-policy trajectory reuse. Empirical results demonstrate that Motif-2-12.7B-Reasoning achieves performance comparable to models with significantly larger parameter counts across mathematics, coding, and agentic benchmarks, offering the community a competitive open model and a practical blueprint for scaling reasoning capabilities under realistic compute constraints.