Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and poor scalability of existing large language model agents in complex decision-making tasks, which typically rely on full interaction histories. The authors propose STEP-HRL, a novel hierarchical reinforcement learning framework that operates solely on single-step transitions. By decomposing tasks into subtasks to form a hierarchical structure, the method leverages representations of completed subtasks to capture global progress and incorporates a local progress module to compress intra-subtask history, thereby generating enriched step-level transition information. This approach substantially reduces dependence on long-horizon histories and achieves superior performance over current baselines on the ScienceWorld and ALFWorld benchmarks, demonstrating notable improvements in task success rate, generalization capability, and token efficiency.
📝 Abstract
Large language model (LLM) agents have demonstrated strong capabilities in complex interactive decision-making tasks. However, existing LLM agents typically rely on increasingly long interaction histories, resulting in high computational cost and limited scalability. In this paper, we propose STEP-HRL, a hierarchical reinforcement learning (HRL) framework that enables step-level learning by conditioning only on single-step transitions rather than full interaction histories. STEP-HRL structures tasks hierarchically, using completed subtasks to represent global progress of overall task. By introducing a local progress module, it also iteratively and selectively summarizes interaction history within each subtask to produce a compact summary of local progress. Together, these components yield augmented step-level transitions for both high-level and low-level policies. Experimental results on ScienceWorld and ALFWorld benchmarks consistently demonstrate that STEP-HRL substantially outperforms baselines in terms of performance and generalization while reducing token usage. Our code is available at https://github.com/TonyStark042/STEP-HRL.
Problem

Research questions and friction points this paper is trying to address.

Large Language Model Agents
Interaction History
Computational Cost
Scalability
Hierarchical Reinforcement Learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Reinforcement Learning
Step-Level Transitions
LLM Agents
Local Progress Summarization
Token Efficiency
S
Shuai Zhen
Beijing University of Posts and Telecommunications
Y
Yanhua Yu
Beijing University of Posts and Telecommunications
R
Ruopei Guo
China Mobile Group Design Institute Co., Ltd
Nan Cheng
Nan Cheng
University of Michigan
condensed matter physics
Yang Deng
Yang Deng
Singapore Management University
Natural Language ProcessingLarge Language ModelsInformation RetrievalDialogue SystemsQA