🤖 AI Summary
Although pretrained visual encoders offer strong semantic representations, their latent spaces are not optimized for planning tasks and often encode irrelevant or distracting information, leading to instability in gradient-based planning. Inspired by the human "perceptual straightening" hypothesis, this work introduces a temporal straightening mechanism into latent-space planning for the first time. By jointly training the encoder and dynamics predictor with curvature regularization, the method encourages locally straighter latent trajectories, thereby reducing the gap between Euclidean and geodesic distances and improving the condition number of the planning objective function. Experiments on multiple goal-reaching tasks demonstrate significant gains in both planning success rates and the stability of gradient-based optimization.
📝 Abstract
Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant -- or even detrimental -- to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks.