Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

📅 2024-06-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional Value Iteration Networks (VINs) struggle to scale to long-horizon, large-scale planning tasks requiring thousands of steps (e.g., navigation in 100×100 mazes), primarily due to vanishing gradients and training instability. To address this, we propose DeepVIN: a deep unrolled value iteration architecture that enhances implicit MDP representation via dynamic transition kernels and introduces an adaptive highway loss to construct gradient shortcuts—enabling, for the first time, end-to-end differentiable value iteration with up to 5,000 layers. This design significantly alleviates optimization difficulties inherent in deep unrolling, ensuring stable training and reliable convergence. Evaluated on 2D maze navigation and the ViZDoom 3D navigation benchmark, DeepVIN substantially outperforms VIN and other baselines, successfully solving ultra-long-horizon planning tasks while demonstrating strong generalization and robustness to environmental variations.

Technology Category

Application Category

📝 Abstract
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100 imes 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introducing an"adaptive highway loss"that constructs skip connections to improve gradient flow. We evaluate our method on both 2D maze navigation environments and the ViZDoom 3D navigation benchmark. We find that our new method, named Dynamic Transition VIN (DT-VIN), easily scales to 5000 layers and casually solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in RL environments.
Problem

Research questions and friction points this paper is trying to address.

Scaling VINs for extreme long-term planning tasks
Improving latent MDP representation and gradient flow
Enabling large-scale planning in complex environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic transition kernel enhances latent MDP
Adaptive highway loss mitigates gradient vanishing
Scales to 5000 layers for long-term planning
🔎 Similar Papers
No similar papers found.
Y
Yuhui Wang
AI Initiative, King Abdullah University of Science and Technology (KAUST), Saudi Arabia
Qingyuan Wu
Qingyuan Wu
University of Southampton, University of Liverpool
Reinforcement LearningMachine LearningCapybara
W
Weida Li
National University of Singapore, Singapore
Dylan R. Ashley
Dylan R. Ashley
Ph.D. Student, Dalle Molle Institute for Artificial Intelligence Research (IDSIA USI-SUPSI)
Reinforcement LearningDeep LearningMachine LearningArtificial Intelligence
Francesco Faccio
Francesco Faccio
Senior Research Scientist, Google DeepMind
Reinforcement LearningDeep LearningNeural Networks
C
Chao Huang
The University of Southampton, England
J
Jürgen Schmidhuber
AI Initiative, King Abdullah University of Science and Technology (KAUST), Saudi Arabia; The Swiss AI Lab IDSIA/USI/SUPSI, Switzerland; NNAISENSE, Switzerland