HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

πŸ“… 2026-04-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the high computational cost and response latency of large reasoning models (LRMs) in long-horizon navigation by introducing HiRO-Nav, an agent that incorporates a novel action-entropy–based dynamic reasoning scheduling mechanism. HiRO-Nav activates the LRM for deep reasoning only at high-entropy critical decision points, while executing other steps rapidly without invoking the model. The training framework combines hybrid supervised fine-tuning and online reinforcement learning, leveraging the correlation between action entropy and Q-values to guide when to trigger costly reasoning. Evaluated on the CHORES-π•Š ObjectNav benchmark, HiRO-Nav achieves competitive success rates while substantially reducing token consumption, outperforming baselines that either always or never invoke the LRM, thereby striking an optimal balance between efficiency and performance.
πŸ“ Abstract
Embodied navigation agents built upon large reasoning models (LRMs) can handle complex, multimodal environmental input and perform grounded reasoning per step to improve sequential decision-making for long-horizon tasks. However, a critical question remains: \textit{how can the reasoning capabilities of LRMs be harnessed intelligently and efficiently for long-horizon navigation tasks?} In simple scenes, agents are expected to act reflexively, while in complex ones they should engage in deliberate reasoning before acting.To achieve this, we introduce \textbf{H}ybr\textbf{i}d \textbf{R}eas\textbf{O}ning \textbf{Nav}igation (\textbf{HiRO-Nav}) agent, the first kind of agent capable of adaptively determining whether to perform thinking at every step based on its own action entropy. Specifically, by examining how the agent's action entropy evolves over the navigation trajectories, we observed that only a small fraction of actions exhibit high entropy, and these actions often steer the agent toward novel scenes or critical objects. Furthermore, studying the relationship between action entropy and task completion (i.e., Q-value) reveals that improving high-entropy actions contributes more positively to task success.Hence, we propose a tailored training pipeline comprising hybrid supervised fine-tuning as a cold start, followed by online reinforcement learning with the proposed hybrid reasoning strategy to explicitly activate reasoning only for high-entropy actions, significantly reducing computational overhead while improving decision quality. Extensive experiments on the \textsc{CHORES}-$\mathbb{S}$ ObjectNav benchmark showcases that HiRO-Nav achieves a better trade-off between success rates and token efficiency than both dense-thinking and no-thinking baselines.
Problem

Research questions and friction points this paper is trying to address.

embodied navigation
large reasoning models
action entropy
long-horizon tasks
token efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid reasoning
action entropy
embodied navigation
large reasoning models
token efficiency
πŸ”Ž Similar Papers
No similar papers found.