Extremum Flow Matching for Offline Goal Conditioned Reinforcement Learning

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality expert demonstrations in humanoid robot imitation learning, this paper proposes an offline goal-conditioned reinforcement learning framework leveraging open suboptimal gameplay data. Methodologically, it introduces Flow Matching for extremal distribution estimation—exploiting its deterministic probability flow property to model the optimal behavioral boundaries of goal-conditioned policies. A novel modular architecture is designed, decoupling and integrating critic, planner, actor, and world model components to unify goal-conditioned imitation and reinforcement learning. Evaluated on the OGBench benchmark and 2D physics-based pushing tasks, the framework demonstrates significant improvements in cross-task generalization. It is successfully deployed on the Talos humanoid robot, enabling multi-step vision-based grasping, placement, and articulated object manipulation in a real kitchen environment. Key contributions include: (i) the first application of Flow Matching to extremal policy boundary estimation; (ii) a unified, modular architecture for goal-conditioned learning; and (iii) empirical validation across simulation and real-world robotic benchmarks.

Technology Category

Application Category

📝 Abstract
Imitation learning is a promising approach for enabling generalist capabilities in humanoid robots, but its scaling is fundamentally constrained by the scarcity of high-quality expert demonstrations. This limitation can be mitigated by leveraging suboptimal, open-ended play data, often easier to collect and offering greater diversity. This work builds upon recent advances in generative modeling, specifically Flow Matching, an alternative to Diffusion models. We introduce a method for estimating the extremum of the learned distribution by leveraging the unique properties of Flow Matching, namely, deterministic transport and support for arbitrary source distributions. We apply this method to develop several goal-conditioned imitation and reinforcement learning algorithms based on Flow Matching, where policies are conditioned on both current and goal observations. We explore and compare different architectural configurations by combining core components, such as critic, planner, actor, or world model, in various ways. We evaluated our agents on the OGBench benchmark and analyzed how different demonstration behaviors during data collection affect performance in a 2D non-prehensile pushing task. Furthermore, we validated our approach on real hardware by deploying it on the Talos humanoid robot to perform complex manipulation tasks based on high-dimensional image observations, featuring a sequence of pick-and-place and articulated object manipulation in a realistic kitchen environment. Experimental videos and code are available at: https://hucebot.github.io/extremum_flow_matching_website/
Problem

Research questions and friction points this paper is trying to address.

Mitigates expert data scarcity in imitation learning
Enhances goal-conditioned policies via Flow Matching
Validates performance on real humanoid robot tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow Matching for deterministic transport
Goal-conditioned policies with observations
Combining critic, planner, actor components
🔎 Similar Papers
No similar papers found.
Quentin Rouxel
Quentin Rouxel
CUHK
RoboticHumanoid RobotsMulti-ContactWhole-Body ControlImitation Learning
C
Clemente Donoso
Inria, CNRS, Université de Lorraine, France
F
Fei Chen
Department of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong
S
S. Ivaldi
Inria, CNRS, Université de Lorraine, France
Jean-Baptiste Mouret
Jean-Baptiste Mouret
Inria
Robot learningQuality DiversityEvolutionary roboticsMAP-ElitesNeuroevolution