Latent Geometry Beyond Search: Amortizing Planning in World Models

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the challenge of achieving efficient goal-directed planning in vision-based world models without relying on computationally expensive online search. It proposes amortizing the planning process into a lightweight inverse dynamics mapping in latent space, leveraging the smooth and uniform geometric structure of a pretrained world model to directly predict actions from the current state, goal state, and remaining time steps. By shifting the planning burden from online optimization to learned inference, this approach reveals that structured latent spaces can intrinsically encode the local geometry necessary for planning. Combining LeWorldModel, latent-space geometric regularization, and a goal-conditioned inverse dynamics model (GC-IDM), the method achieves controller performance that matches or exceeds Cross-Entropy Method (CEM) in seven out of eight settings across four benchmark environments, while reducing per-decision computational cost by 100–130×.

📝 Abstract

Modern vision-based world models can represent observations as compact yet expressive latent manifolds, but fast goal-oriented planning in these spaces remains challenging. This raises a central question: when does a learned representation simplify control, rather than merely enabling prediction? We study this question in a pretrained LeWorldModel, whose latent geometry is regularized for smoothness and uniformity. Our key insight is that, under such geometry, planning can be amortized into a latent inverse-dynamics mapping instead of requiring online search. We therefore replace iterative planning with a lightweight Goal-Conditioned Inverse Dynamics Model (GC-IDM) that maps the current latent state, goal latent state, and remaining horizon directly to the next action. Empirically, across four benchmark environments spanning navigation, contact-rich manipulation, and continuous control, our controller matches or exceeds CEM in seven of eight environment-protocol settings while reducing per-decision cost by 100-130x. A broader sweep over test-time planners (CEM, MPPI, iCEM, and gradient-based methods) shows that this result is not specific to a particular optimizer. These findings suggest that much of the structure recovered by test-time planning is already locally encoded in the latent representation. More broadly, our results indicate that sufficiently structured latent spaces can shift part of the planning burden from online optimization to learned inference.

Problem

Research questions and friction points this paper is trying to address.

world models

latent geometry

goal-oriented planning

amortized inference

inverse dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

amortized planning

latent geometry

inverse dynamics model