Guided Streaming Stochastic Interpolant Policy

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing generative robot policies struggle to simultaneously achieve low latency and high reactivity during inference, hindering dynamic goal alignment and obstacle avoidance. This work addresses this limitation by analyzing the temporal evolution of value functions and leveraging the backward Kolmogorov equation to derive the optimal guidance term for stochastic interpolation models—providing, for the first time, theoretical guarantees for such models. Building on this foundation, we propose the Streaming Stochastic Interpolation Policy (SSIP), which integrates a streaming inference architecture and supports both zero-shot adaptation and amortized inference deployment modes. Evaluated in dynamic, unstructured environments, SSIP significantly outperforms conventional chunk-based strategies, demonstrating superior performance in both reaction speed and physical plausibility.

📝 Abstract

Inference-time guidance is essential for steering generative robot policies toward dynamic objectives without retraining, yet existing methods are largely confined to chunk-based architectures that exhibit high latency and lack the reactivity needed for test-time preference alignment or obstacle avoidance. In this work, we formally derive the optimal guidance term for Stochastic Interpolants (SI) by analyzing the value function's time evolution via the Backward Kolmogorov Equation, establishing a modified drift that theoretically guarantees sampling from a target distribution. We apply this framework to real-time control through the Streaming Stochastic Interpolant Policy (SSIP), which generalizes the deterministic Streaming Flow Policy (SFP). Unifying this guidance law with the streaming architecture enables fast and reactive control. To support diverse deployment needs, we propose two complementary mechanisms: training-free Stochastic Trajectory Ensemble Guidance (STEG) that computes gradients on-the-fly for zero-shot adaptation, and training-based Conditional Critic Guidance (CCG) for amortized inference. Empirical evaluations demonstrate that our guided streaming approach significantly outperforms conventional chunk-based policies in reactivity and provides superior, physically valid guidance for dynamic, unstructured environments.

Problem

Research questions and friction points this paper is trying to address.

inference-time guidance

reactivity

chunk-based policies

dynamic objectives

real-time control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Interpolants

Inference-time Guidance

Streaming Policy