ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning

šŸ“… 2025-12-18
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Addressing the challenges of high task complexity and scarce training data in long-horizon robotic manipulation, this paper proposes a hierarchical skill-policy framework. First, complex tasks are decomposed into reusable local skills, with motion planning ensuring coherent action sequences. Second, high-quality training data are automatically generated from only ten human demonstrations, followed by imitation learning. Third, online adaptation is integrated with PPO/SAC-based reinforcement learning for joint fine-tuning of skill modules and motion objectives. This work introduces the first closed-loop paradigm unifying ā€œtask decomposition—automated data generation—imitation learning—reinforcement fine-tuning.ā€ Evaluated on RoboSuite under the highest reset difficulty, the approach achieves an 80% success rate in vision–motor control tasks. Ablation studies demonstrate that the joint fine-tuning stage improves average performance by 89%.

Technology Category

Application Category

šŸ“ Abstract
Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/
Problem

Research questions and friction points this paper is trying to address.

Develops hybrid skill policies for long-horizon robotic manipulation
Automates data generation and reinforcement learning for task improvement
Segments tasks into skills with imitation learning and motion planning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid skill policies combine task decomposition and motion planning
Automated data generation from human demonstrations enables imitation learning
Reinforcement learning fine-tuning improves component performance significantly
šŸ”Ž Similar Papers
No similar papers found.