Improving Long-Range Navigation with Spatially-Enhanced Recurrent Memory via End-to-End Reinforcement Learning

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

To address the limited spatial memory capacity of recurrent neural networks (RNNs)—such as LSTMs and GRUs—in end-to-end reinforcement learning (RL) for robot navigation, and their difficulty in fusing multi-view sequential observations for long-horizon path planning, this paper proposes the Spatially Enhanced Recurrent Unit (SRU). SRU is the first RNN architecture to intrinsically embed an attention mechanism within its recurrent structure, enabling implicit long-range spatial modeling and planning solely from forward-facing stereo image sequences. The method integrates SRU with end-to-end RL training, large-scale pretraining on synthetic depth data, and sim-to-real zero-shot transfer. Experiments demonstrate that our approach achieves a 23.5% improvement over state-of-the-art RNN baselines and outperforms explicit mapping-based methods by 29.6% on long-horizon navigation tasks. Moreover, it successfully deploys in complex real-world environments without any real-world fine-tuning.

Technology Category

Application Category

📝 Abstract

Recent advancements in robot navigation, especially with end-to-end learning approaches like reinforcement learning (RL), have shown remarkable efficiency and effectiveness. Yet, successful navigation still relies on two key capabilities: mapping and planning, whether explicit or implicit. Classical approaches use explicit mapping pipelines to register ego-centric observations into a coherent map frame for the planner. In contrast, end-to-end learning achieves this implicitly, often through recurrent neural networks (RNNs) that fuse current and past observations into a latent space for planning. While architectures such as LSTM and GRU capture temporal dependencies, our findings reveal a key limitation: their inability to perform effective spatial memorization. This skill is essential for transforming and integrating sequential observations from varying perspectives to build spatial representations that support downstream planning. To address this, we propose Spatially-Enhanced Recurrent Units (SRUs), a simple yet effective modification to existing RNNs, designed to enhance spatial memorization capabilities. We introduce an attention-based architecture with SRUs, enabling long-range navigation using a single forward-facing stereo camera. Regularization techniques are employed to ensure robust end-to-end recurrent training via RL. Experimental results show our approach improves long-range navigation by 23.5% compared to existing RNNs. Furthermore, with SRU memory, our method outperforms the RL baseline with explicit mapping and memory modules, achieving a 29.6% improvement in diverse environments requiring long-horizon mapping and memorization. Finally, we address the sim-to-real gap by leveraging large-scale pretraining on synthetic depth data, enabling zero-shot transfer to diverse and complex real-world environments.

Problem

Research questions and friction points this paper is trying to address.

Enhance spatial memorization in RNNs for navigation

Improve long-range navigation with single stereo camera

Address sim-to-real gap via synthetic depth pretraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatially-Enhanced Recurrent Units (SRUs) for spatial memorization

Attention-based architecture with SRUs for long-range navigation

Large-scale pretraining on synthetic depth for sim-to-real transfer

🔎 Similar Papers

No similar papers found.