Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing latent-variable world models rely on Euclidean distance in latent space to assess goal reachability during planning, which is prone to interference from task-irrelevant dimensions and often leads to inaccurate trajectory quality evaluation. This work proposes Trajectory Reachability Metric (TRM), a lightweight metric head that learns horizon-aligned pairwise reachability indicators from historical trajectories to augment or replace the original terminal cost, without modifying the underlying world model architecture. TRM represents the first approach to post-hoc repair of the planning interface for fixed latent-variable models and reveals that critical task-relevant signals occupy an extremely small yet decisive subspace within the latent representation. Experiments on the TwoRoom task demonstrate that TRM boosts success rates of LeWM and PLDM from 7.0% and 32.7% to 97.0% and 84.0%, respectively, substantially improving candidate trajectory ranking and goal selection.

📝 Abstract

Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean distance between predicted terminal and goal latent states; this assumes that raw latent distance weights reachability-relevant variables correctly. We propose trajectory reachability metrics (TRM), a post-hoc terminal-ranking method for fixed latent world models. TRM trains a small pairwise head from logged trajectory structure and uses it as a replacement or hybrid cost; the encoder, dynamics, sampler, optimizer, and evaluation manifests remain fixed. The key design choice is horizon-aware supervision: the metric is trained on broad, balanced temporal separations to match the long-horizon terminal candidate ranking problem. On a hard TwoRoom benchmark, raw latent planning with LeWorldModel (LeWM) reaches 7.0% success, while full-horizon TRM reaches 97.0%; shuffled temporal-label controls stay at 0.0%. The same recipe improves a PLDM baseline from 32.7% to 84.0% across three seeds, and a short-horizon TRM variant reaches only 35.0% with the 100,000 pair budget. In TwoRoom, we provide mechanistic evidence for why TRM works: XY position is linearly decodable (R^2=0.998), yet raw latent MSE misranks candidates; the XY-probe rowspace accounts for less than 1% of terminal-goal latent MSE but carries most candidate-quality signal; and SCSA audits show that TRM improves the ordering and selected endpoint seen by the planner. On PushT go50/go75, TRM-style task-state metrics improve SCSA ranking and selected final distance more cleanly than closed-loop success, motivating auxiliary hybrid costs in continuous manipulation. TRM is the planner-facing repair, and audits explain when terminal reachability metrics should replace or augment raw latent proximity.

Problem

Research questions and friction points this paper is trying to address.

latent world models

trajectory reachability

terminal-cost ranking

planning

Euclidean proximity

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory reachability metrics

latent world models

horizon-matched supervision