Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the limitation of existing large language model agents, which typically rely on fixed reasoning architectures and struggle to dynamically adapt their reasoning strategies like humans in complex problem-solving scenarios. To overcome this, the paper introduces Deep Reasoning—a novel approach that treats the construction of reasoning scaffolds as a learnable and executable meta-reasoning process. By leveraging structured metacognition, the method dynamically generates task-adapted reasoning workflows at inference time through inline contextual examples, orchestrating associative reasoning, formal computation, and recursive subproblem decomposition in a coordinated manner. The resulting DOLORES agent achieves an average improvement of 24.8% over the strongest baselines across four challenging benchmarks; notably, its 8B variant outperforms all 32B models in the same family, while substantially mitigating issues of hallucination and premature termination.

📝 Abstract

Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified subproblems. Current LLM agents lack this flexibility, as their scaffolds hard-code such reasoning decisions in advance. These scaffolds are effective when their prescribed structure matches the task, but brittle when solving the task requires adapting the structure of reasoning itself. We introduce Deep Reasoning -- an inference-time approach for constructing task-specific scaffolds through structured meta-reasoning. Deep Reasoning uses a formal language that represents meta-reasoning as executable decompositions over associative inference, formal computation, and recursive subproblem solving, enabling decomposition principles to be encoded as in-context examples that guide test-time scaffold construction. We instantiate this approach in a general-purpose agent (DOLORES) that distributes complex tasks across more controlled reasoning threads. We evaluate it against state-of-the-art scaffolding methods across four hard benchmarks: multi-hop reasoning, long-chain question answering, long-context aggregation, and deep research-style information seeking. DOLORES outperforms all evaluated scaffolds across three model sizes and two model families, improving over the strongest evaluated scaffold baseline by 24.8% on average. DOLORES distributes cognition across structured, lower-load reasoning threads, thereby reducing premature termination and hallucinations. This advantage can even bridge the scaling gap, with an 8B version surpassing all evaluated 32B baselines from the same family in more than half the settings. These results point toward future agentic systems that treat scaffolding as adaptive reasoning, constructing the structure each task requires just-in-time.

Problem

Research questions and friction points this paper is trying to address.

reasoning flexibility

LLM agents

reasoning scaffolds

meta-cognition

adaptive reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reasoning

structured meta-cognition

dynamic scaffolding