Flow-based Domain Randomization for Learning and Sequencing Robotic Skills

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Simulation-to-real transfer suffers from insufficient domain robustness and weak out-of-distribution (OOD) detection capability. Method: This paper proposes a Normalizing-Flow-based Automatic Domain Randomization (ADR) framework that jointly optimizes policy and environment parameter sampling distribution via entropy-regularized reinforcement learning—enabling the first differentiable, high-capacity automatic learning of complex environment distributions. To further enhance uncertainty-aware planning and OOD identification, we introduce a privileged value function coordination mechanism. Contribution/Results: Evaluated on six simulation benchmarks and one real-robot task, our approach significantly outperforms hand-designed and parametric randomization baselines, achieving superior policy generalization and higher cross-domain transfer success rates. The results empirically validate the effectiveness of jointly modeling data-driven distribution learning and OOD detection for robust sim-to-real transfer.

Technology Category

Application Category

📝 Abstract

Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies trained in simulation. By randomizing environment properties during training, the learned policy can become robust to uncertainties along the randomized dimensions. While the environment distribution is typically specified by hand, in this paper we investigate automatically discovering a sampling distribution via entropy-regularized reward maximization of a normalizing-flow-based neural sampling distribution. We show that this architecture is more flexible and provides greater robustness than existing approaches that learn simpler, parameterized sampling distributions, as demonstrated in six simulated and one real-world robotics domain. Lastly, we explore how these learned sampling distributions, combined with a privileged value function, can be used for out-of-distribution detection in an uncertainty-aware multi-step manipulation planner.

Problem

Research questions and friction points this paper is trying to address.

Automating domain randomization distribution discovery

Enhancing robustness in robotic skills learning

Out-of-distribution detection in manipulation planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

normalizing-flow-based neural sampling

entropy-regularized reward maximization

uncertainty-aware multi-step manipulation planner

🔎 Similar Papers

Task-unaware Lifelong Robot Learning with Retrieval-based Weighted Local Adaptation