Learning Free Terminal Time Optimal Closed-loop Control of Manipulators

📅 2023-11-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Closed-loop control for robotic manipulation tasks with free terminal time poses a challenge in jointly optimizing task duration and control inputs. Method: This paper proposes an end-to-end learning framework featuring three key innovations: (i) progressive time discretization, (ii) a terminal-time-adaptive QRnet architecture, and (iii) an automated initial value problem (IVP) enhancement sampling strategy. The closed-loop policy network is trained using optimal open-loop solutions as supervision signals, integrated with marching-based temporal refinement and dynamic IVP sampling. Contribution/Results: Our approach overcomes stability and generalization bottlenecks in policy training under free terminal time. It enables adaptive terminal-time adjustment across diverse tasks and broad initial state distributions, achieving total cost close to the global optimum. Empirical evaluation demonstrates significantly improved solution success rate and robustness compared to prior methods.

📝 Abstract

This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Three main challenges are addressed in this extension. First, we introduce a marching scheme that enhances the solution quality and increases the success rate of the open-loop solver by gradually refining time discretization. Second, we extend the QRnet in Nakamura-Zimmerer et al. (2021b) to the free terminal time setting to address discontinuity and improve stability at the terminal state. Third, we present a more automated version of the initial value problem (IVP) enhanced sampling method from previous work (Zhang et al., 2022) to adaptively update the training dataset, significantly improving its quality. By integrating these techniques, we develop a closed-loop policy that operates effectively over a broad domain with varying optimal time durations, achieving near globally optimal total costs.

Problem

Research questions and friction points this paper is trying to address.

Dynamic adjustment of task duration and control inputs for robotic manipulation

Extending supervised learning to free terminal time optimal control

Improving solution quality and stability in closed-loop policy learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Marching scheme refines time discretization dynamically

Extended QRnet enhances terminal state stability

Automated IVP sampling improves training dataset quality

🔎 Similar Papers

No similar papers found.