Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

📅 2024-06-12

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Traditional Value Iteration Networks (VINs) struggle to scale to long-horizon, large-scale planning tasks requiring thousands of steps (e.g., navigation in 100×100 mazes), primarily due to vanishing gradients and training instability. To address this, we propose DeepVIN: a deep unrolled value iteration architecture that enhances implicit MDP representation via dynamic transition kernels and introduces an adaptive highway loss to construct gradient shortcuts—enabling, for the first time, end-to-end differentiable value iteration with up to 5,000 layers. This design significantly alleviates optimization difficulties inherent in deep unrolling, ensuring stable training and reliable convergence. Evaluated on 2D maze navigation and the ViZDoom 3D navigation benchmark, DeepVIN substantially outperforms VIN and other baselines, successfully solving ultra-long-horizon planning tasks while demonstrating strong generalization and robustness to environmental variations.

Technology Category

Application Category

📝 Abstract

The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100 imes 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introducing an"adaptive highway loss"that constructs skip connections to improve gradient flow. We evaluate our method on both 2D maze navigation environments and the ViZDoom 3D navigation benchmark. We find that our new method, named Dynamic Transition VIN (DT-VIN), easily scales to 5000 layers and casually solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in RL environments.

Problem

Research questions and friction points this paper is trying to address.

Scaling VINs for extreme long-term planning tasks

Improving latent MDP representation and gradient flow

Enabling large-scale planning in complex environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic transition kernel enhances latent MDP

Adaptive highway loss mitigates gradient vanishing

Scales to 5000 layers for long-term planning

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning