Trajectory First: A Curriculum for Discovering Diverse Policies

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient policy diversity and limited exploration in complex robotic manipulation tasks under reinforcement learning, this paper proposes a trajectory-first curriculum learning paradigm. First, it models the high-dimensional behavioral trajectory distribution to enable global exploration; then, it transfers trajectory-level knowledge to step-wise policies via hierarchical policy distillation. This approach overcomes the local exploration bottleneck imposed by conventional skill-space constraints, integrating constrained diversity optimization, trajectory clustering, and hybrid training—comprising behavior cloning initialization followed by PPO fine-tuning. Evaluated on multi-stage robotic manipulation tasks, the method achieves a 42% improvement in policy diversity (measured by Jensen–Shannon divergence) over skill-based baselines, while demonstrating superior robustness and cross-task generalization capability.

Technology Category

Application Category

📝 Abstract
Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a curriculum that first explores at the trajectory level before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.
Problem

Research questions and friction points this paper is trying to address.

Enhancing policy diversity in reinforcement learning for robustness
Addressing under-exploration in complex tasks like robotic manipulation
Improving diversity optimization via trajectory-level curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum-based trajectory exploration first
Diverse step-based policy learning
Improved skill diversity optimization
🔎 Similar Papers
No similar papers found.