🤖 AI Summary
This work addresses the limited performance of large language models in long-context reasoning tasks, which hinders their applicability in complex real-world scenarios. The study introduces a novel decomposition-based approach to modeling long-context reasoning by proposing the concept of “atomic skills”—fundamental, trainable units of reasoning capability. Complex reasoning processes are systematically decomposed into these atomic skills and refined through task decomposition, synthetic pseudo-data generation, and reinforcement learning–based fine-tuning. This strategy departs from conventional end-to-end training paradigms and achieves substantial improvements across six established benchmarks, including Loogle and Loong, raising average performance from 46.3% to 54.0%—a 7.7% absolute gain—thereby significantly enhancing the model’s capacity for long-context reasoning.
📝 Abstract
Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each explicitly targeting a specific atomic skill. Our empirical analysis confirms that proficiency in these atomic skills is strongly correlated with general long-text reasoning performance. Building on this insight, we employ reinforcement learning on these pseudo datasets to sharpen the model's atomic skills, in the hope of boosting its general long-context reasoning ability. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach: it outperforms a strong baseline by an average margin of 7.7\% (improving from 46.3\% to 54.0\%) across Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, and MRCR.