OFlow: Injecting Object-Aware Temporal Flow Matching for Robust Robotic Manipulation

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing vision-language-action (VLA) models struggle to jointly handle temporal prediction and object perception in complex scenes due to their reliance on disentangled latent spaces, which limits robustness. This work proposes the first approach that unifies object perception and temporal dynamics within a shared semantic latent space. By decomposing object-centric representations and using them to condition continuous action generation, the method emphasizes physically relevant cues while suppressing task-irrelevant variations. This joint modeling significantly enhances generalization under distribution shifts, achieving higher success rates and improved robustness across diverse simulated and real-world benchmarks, including LIBERO, LIBERO-Plus, MetaWorld, and SimplerEnv.

Technology Category

Application Category

📝 Abstract

Robust robotic manipulation requires not only predicting how the scene evolves over time, but also recognizing task-relevant objects in complex scenes. However, existing VLA models face two limitations. They typically act only on the current frame, while future prediction and object-aware reasoning are often learned in separate latent spaces. We propose OFlow (injecting Object-Aware Temporal Flow Matching into VLAs), a framework that addresses both limitations by unifying temporal foresight and object-aware reasoning in a shared semantic latent space. Our method forecasts future latents with temporal flow matching, factorizes them into object-aware representations that emphasize physically relevant cues while filtering task-irrelevant variation, and conditions continuous action generation on these predictions. By integrating OFlow into VLA pipelines, our method enables more reliable control under distribution shifts. Extensive experiments across LIBERO, LIBERO-Plus, MetaWorld, and SimplerEnv benchmarks and real-world tasks demonstrate that object-aware foresight consistently enhances robustness and success.

Problem

Research questions and friction points this paper is trying to address.

robotic manipulation

temporal prediction

object-aware reasoning

visual-language-action models

distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

Object-Aware Temporal Flow Matching

Visual-Language-Action Models

Temporal Foresight