Temporal Difference Flows

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing geometric horizon models (GHMs) suffer from bootstrapping prediction bias, hindering long-horizon, high-fidelity state forecasting. To address this, we propose Temporal Difference Flow (TD-Flow), the first flow-based GHM learning paradigm grounded in a probabilistic path-wise Bellman equation. TD-Flow unifies flow matching with generative temporal modeling to directly predict future state distributions—bypassing stepwise error accumulation inherent in autoregressive approaches. We theoretically establish its convergence and identify gradient variance reduction as the key mechanism underlying its performance gains. Empirically, TD-Flow extends the predictable horizon by over 5× compared to prior methods and significantly improves generation quality and policy evaluation accuracy across diverse domains. When integrated with behavior foundation models, it delivers substantial improvements in long-horizon planning tasks.

Technology Category

Application Category

📝 Abstract

Predictive models of the future are fundamental for an agent's ability to reason and plan. A common strategy learns a world model and unrolls it step-by-step at inference, where small errors can rapidly compound. Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states, avoiding cumulative inference errors. While GHMs can be conveniently learned by a generative analog to temporal difference (TD) learning, existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods. Theoretically, we establish a new convergence result and primarily attribute TD-Flow's efficacy to reduced gradient variance during training. We further show that similar arguments can be extended to diffusion-based methods. Empirically, we validate TD-Flow across a diverse set of domains on both generative metrics and downstream tasks including policy evaluation. Moreover, integrating TD-Flow with recent behavior foundation models for planning over pre-trained policies demonstrates substantial performance gains, underscoring its promise for long-horizon decision-making.

Problem

Research questions and friction points this paper is trying to address.

Overcoming cumulative errors in predictive models for future state predictions.

Improving long-horizon prediction quality in Geometric Horizon Models.

Reducing gradient variance to enhance training efficiency and model accuracy.

Innovation

Methods, ideas, or system contributions that make the work stand out.

TD-Flow uses novel Bellman equation structure

Flow-matching techniques enhance prediction accuracy

Reduces gradient variance for effective training

🔎 Similar Papers

No similar papers found.