Stochastic Trajectory Optimization for Robotic Skill Acquisition From a Suboptimal Demonstration

📅 2024-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low imitation efficiency in robot learning from suboptimal human demonstrations—characterized by poor dynamic performance (e.g., low velocity, high jitter)—this paper proposes MSTOMP, a multi-strategy trajectory optimization framework. Its core contributions are: (1) the first integration of frequency-domain denoising with a novel MSES (Modified Spectral Error Similarity) metric to preserve trajectory shape fidelity; (2) a unified time–frequency domain modeling theory that tightly couples DTW-based temporal alignment, frequency-domain gain control, and STOMP-based optimization; and (3) multi-strategy sampling and spectral error modeling to enhance robustness against demonstration noise. Evaluated in both simulation and real-world robotic arm experiments, MSTOMP achieves significantly faster convergence (2.3× on average), improved optimization stability, and superior dynamic execution quality—outperforming state-of-the-art Learning-from-Demonstration (LfD) trajectory optimization methods.

Technology Category

Application Category

📝 Abstract
Learning from Demonstration (LfD) has emerged as a crucial method for robots to acquire new skills. However, when given suboptimal task trajectory demonstrations with shape characteristics reflecting human preferences but subpar dynamic attributes such as slow motion, robots not only need to mimic the behaviors but also optimize the dynamic performance. In this work, we leverage optimization-based methods to search for a superior-performing trajectory whose shape is similar to that of the demonstrated trajectory. Specifically, we use Dynamic Time Warping (DTW) to quantify the difference between two trajectories and combine it with additional performance metrics, such as collision cost, to construct the cost function. Moreover, we develop a multi-policy version of the Stochastic Trajectory Optimization for Motion Planning (STOMP), called MSTOMP, which is more stable and robust to parameter changes. To deal with the jitter in the demonstrated trajectory, we further utilize the gain-controlling method in the frequency domain to denoise the demonstration and propose a computationally more efficient metric, called Mean Square Error in the Spectrum (MSES), that measures the trajectories' differences in the frequency domain. We also theoretically highlight the connections between the time domain and the frequency domain methods. Finally, we verify our method in both simulation experiments and real-world experiments, showcasing its improved optimization performance and stability compared to existing methods. The source code can be found at https://ming-bot.github.io/MSTOMP.github.io.
Problem

Research questions and friction points this paper is trying to address.

Optimizing suboptimal robotic trajectories from human demonstrations
Enhancing dynamic performance while preserving trajectory shape
Denoising and efficiently measuring trajectory differences in frequency domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DTW for trajectory shape similarity
Develops MSTOMP for robust optimization
Proposes MSES for frequency domain analysis
Chenlin Ming
Chenlin Ming
Shanghai Jiao Tong University
RoboticsMLLLM
Z
Zitong Wang
The Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
B
Boxuan Zhang
School of Computation, Information and Technology in Technical University of Munich (TUM)
Xiaoming Duan
Xiaoming Duan
Shanghai Jiao Tong University
J
Jianping He
The Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
Zhanxiang Cao
Zhanxiang Cao
上海交通大学
RoboticsReinforcement LearningLegged Robot