A Decomposition Perspective to Long-context Reasoning for LLMs

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the limited performance of large language models in long-context reasoning tasks, which hinders their applicability in complex real-world scenarios. The study introduces a novel decomposition-based approach to modeling long-context reasoning by proposing the concept of “atomic skills”—fundamental, trainable units of reasoning capability. Complex reasoning processes are systematically decomposed into these atomic skills and refined through task decomposition, synthetic pseudo-data generation, and reinforcement learning–based fine-tuning. This strategy departs from conventional end-to-end training paradigms and achieves substantial improvements across six established benchmarks, including Loogle and Loong, raising average performance from 46.3% to 54.0%—a 7.7% absolute gain—thereby significantly enhancing the model’s capacity for long-context reasoning.

Technology Category

Application Category

📝 Abstract

Long-context reasoning is essential for complex real-world applications, yet remains a significant challenge for Large Language Models (LLMs). Despite the rapid evolution in long-context reasoning, current research often overlooks the internal complexity of the long-context reasoning task itself. In this paper, we move beyond this holistic view and decompose long-context reasoning into a set of fundamental atomic skills, and we then automatically synthesize a suite of pseudo datasets, each explicitly targeting a specific atomic skill. Our empirical analysis confirms that proficiency in these atomic skills is strongly correlated with general long-text reasoning performance. Building on this insight, we employ reinforcement learning on these pseudo datasets to sharpen the model's atomic skills, in the hope of boosting its general long-context reasoning ability. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach: it outperforms a strong baseline by an average margin of 7.7\% (improving from 46.3\% to 54.0\%) across Loogle, Loong, LongBench-v2, BrowscompLong, Ruler-qa2, and MRCR.

Problem

Research questions and friction points this paper is trying to address.

long-context reasoning

Large Language Models

atomic skills

reasoning complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-context reasoning

skill decomposition

atomic skills

pseudo dataset synthesis