🤖 AI Summary
Traditional Value Iteration Networks (VINs) struggle to scale to long-horizon, large-scale planning tasks requiring thousands of steps (e.g., navigation in 100×100 mazes), primarily due to vanishing gradients and training instability. To address this, we propose DeepVIN: a deep unrolled value iteration architecture that enhances implicit MDP representation via dynamic transition kernels and introduces an adaptive highway loss to construct gradient shortcuts—enabling, for the first time, end-to-end differentiable value iteration with up to 5,000 layers. This design significantly alleviates optimization difficulties inherent in deep unrolling, ensuring stable training and reliable convergence. Evaluated on 2D maze navigation and the ViZDoom 3D navigation benchmark, DeepVIN substantially outperforms VIN and other baselines, successfully solving ultra-long-horizon planning tasks while demonstrating strong generalization and robustness to environmental variations.
📝 Abstract
The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100 imes 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due to two issues: the representation capacity of the latent MDP and the planning module's depth. We address these by augmenting the latent MDP with a dynamic transition kernel, dramatically improving its representational capacity, and, to mitigate the vanishing gradient problem, introducing an"adaptive highway loss"that constructs skip connections to improve gradient flow. We evaluate our method on both 2D maze navigation environments and the ViZDoom 3D navigation benchmark. We find that our new method, named Dynamic Transition VIN (DT-VIN), easily scales to 5000 layers and casually solves challenging versions of the above tasks. Altogether, we believe that DT-VIN represents a concrete step forward in performing long-term large-scale planning in RL environments.