π€ AI Summary
Existing video-based generative robot learning approaches suffer from unstable generation quality, limited fine-grained manipulation capability, absence of environmental feedback integration, and scarcity of real-world demonstration data. To address these limitations, we propose GenFlowRLβa novel framework that introduces the first generative object-centric optical flow model. This model extracts low-dimensional, disentangled object motion representations from heterogeneous (simulated and real) visual data, enabling differentiable reward shaping. By unifying video generation, inverse dynamics modeling, and reinforcement learning, GenFlowRL mitigates the detrimental impact of video generation uncertainty on policy optimization. We evaluate GenFlowRL on ten diverse manipulation tasks spanning simulation and real-world settings. Results demonstrate significant improvements over state-of-the-art baselines, validating its generalization capability, robustness to domain shift, and cross-platform adaptability.
π Abstract
Recent advances have shown that video generation models can enhance robot learning by deriving effective robot actions through inverse dynamics. However, these methods heavily depend on the quality of generated data and struggle with fine-grained manipulation due to the lack of environment feedback. While video-based reinforcement learning improves policy robustness, it remains constrained by the uncertainty of video generation and the challenges of collecting large-scale robot datasets for training diffusion models. To address these limitations, we propose GenFlowRL, which derives shaped rewards from generated flow trained from diverse cross-embodiment datasets. This enables learning generalizable and robust policies from diverse demonstrations using low-dimensional, object-centric features. Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow, consistently achieving superior performance across diverse and challenging scenarios. Our Project Page: https://colinyu1.github.io/genflowrl