ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of predefined scheduling, high inference costs, and extensive hyperparameter search in decomposable reasoning tasks, this paper proposes the Thought Graph framework. It models intermediate reasoning steps as a dynamic graph structure and formalizes its evolution for the first time as a Markov Decision Process (MDP). An unfinetuned large language model (LLM) serves as a zero-shot policy agent, autonomously selecting graph transformation actions—such as decomposition, merging, and pruning—to enable end-to-end adaptive inference control. The framework integrates multi-LLM collaboration, state-aware subproblem decomposition, and reinforcement learning–inspired action policies. Evaluated on HumanEval, it achieves a 29% absolute accuracy improvement and reduces inference cost by 35%, while eliminating manual scheduling and hyperparameter tuning entirely. These results demonstrate the effectiveness and generality of policy-driven LLMs in unsupervised complex reasoning.

Technology Category

Application Category

📝 Abstract
Recent research has shown that LLM performance on reasoning tasks can be enhanced by scaling test-time compute. One promising approach, particularly with decomposable problems, involves arranging intermediate solutions as a graph on which transformations are performed to explore the solution space. However, prior works rely on pre-determined, task-specific transformation schedules which are subject to a set of searched hyperparameters. In this work, we view thought graph transformations as actions in a Markov decision process, and implement policy agents to drive effective action policies for the underlying reasoning LLM agent. In particular, we investigate the ability for another LLM to act as a policy agent on thought graph environments and introduce ARIES, a multi-agent architecture for reasoning with LLMs. In ARIES, reasoning LLM agents solve decomposed subproblems, while policy LLM agents maintain visibility of the thought graph states, and dynamically adapt the problem-solving strategy. Through extensive experiments, we observe that using off-the-shelf LLMs as policy agents with no supervised fine-tuning (SFT) can yield up to $29%$ higher accuracy on HumanEval relative to static transformation schedules, as well as reducing inference costs by $35%$ and avoid any search requirements. We also conduct a thorough analysis of observed failure modes, highlighting that limitations on LLM sizes and the depth of problem decomposition can be seen as challenges to scaling LLM-guided reasoning.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLM reasoning via autonomous thought graph transformations.
Dynamic policy agents adapt strategies for efficient problem-solving.
Reduce inference costs and improve accuracy without fine-tuning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs as policy agents in thought graphs
Dynamic adaptation of problem-solving strategies
No supervised fine-tuning for policy agents
🔎 Similar Papers
No similar papers found.
Pedro Gimenes
Pedro Gimenes
Imperial College London
Machine Learning
Z
Zeyu Cao
Department of Computer Science and Technology, University of Cambridge
J
Jeffrey Wong
Department of Electrical & Electronic Engineering, Imperial College London
Yiren Zhao
Yiren Zhao
University of Toronto
Computer NetworksOptical NetworksDatacenter Networks