🤖 AI Summary
Existing geometric horizon models (GHMs) suffer from bootstrapping prediction bias, hindering long-horizon, high-fidelity state forecasting. To address this, we propose Temporal Difference Flow (TD-Flow), the first flow-based GHM learning paradigm grounded in a probabilistic path-wise Bellman equation. TD-Flow unifies flow matching with generative temporal modeling to directly predict future state distributions—bypassing stepwise error accumulation inherent in autoregressive approaches. We theoretically establish its convergence and identify gradient variance reduction as the key mechanism underlying its performance gains. Empirically, TD-Flow extends the predictable horizon by over 5× compared to prior methods and significantly improves generation quality and policy evaluation accuracy across diverse domains. When integrated with behavior foundation models, it delivers substantial improvements in long-horizon planning tasks.
📝 Abstract
Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.