HiMAC: Hierarchical Macro-Micro Learning for Long-Horizon LLM Agents

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the challenges of planning deficiency, unreliable execution, and error propagation in large language model agents tackling long-horizon tasks due to flat autoregressive policies. To overcome these limitations, the authors propose HiMAC, a novel hierarchical policy optimization framework that, for the first time, operates without a critic. HiMAC explicitly decomposes decision-making into macro-level planning and micro-level execution, enabling robust control through structured blueprint generation and goal-conditioned action execution. The framework further mitigates non-stationarity in hierarchical learning via hierarchical relative advantage estimation and an alternating co-evolutionary training scheme between planner and executor. Experiments demonstrate that HiMAC achieves state-of-the-art performance across ALFWorld, WebShop, and Sokoban, significantly improving sample efficiency and delivering strong results in both textual and visual environments.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) agents have recently demonstrated strong capabilities in interactive decision-making, yet they remain fundamentally limited in long-horizon tasks that require structured planning and reliable execution. Existing approaches predominantly rely on flat autoregressive policies, where high-level reasoning and low-level actions are generated within a single token sequence, leading to inefficient exploration and severe error propagation over extended trajectories. In this work, we propose HiMAC, a hierarchical agentic RL framework that explicitly decomposes long-horizon decision-making into macro-level planning and micro-level execution. HiMAC models reasoning as a structured blueprint generation process followed by goal-conditioned action execution, enabling robust long-horizon planning within LLM-based agents. To train this hierarchy efficiently, we introduce a critic-free hierarchical policy optimization paradigm that extends group-based reinforcement learning to bi-level structures through hierarchical relative advantage estimation. Furthermore, we propose an iterative co-evolution training strategy that alternates between planner exploration and executor adaptation, mitigating the non-stationarity inherent in hierarchical learning. Extensive experiments on ALFWorld, WebShop, and Sokoban demonstrate that HiMAC consistently outperforms strong prompting and reinforcement learning baselines, achieving state-of-the-art performance and substantially improved sample efficiency across both text-based and visually grounded environments. Our results show that introducing structured hierarchy, rather than increasing model scale alone, is a key factor for enabling robust long-horizon agentic intelligence.

Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks

hierarchical planning

error propagation

structured reasoning

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Reinforcement Learning

Long-Horizon Planning

Macro-Micro Decomposition