🤖 AI Summary
Closed-loop control for robotic manipulation tasks with free terminal time poses a challenge in jointly optimizing task duration and control inputs. Method: This paper proposes an end-to-end learning framework featuring three key innovations: (i) progressive time discretization, (ii) a terminal-time-adaptive QRnet architecture, and (iii) an automated initial value problem (IVP) enhancement sampling strategy. The closed-loop policy network is trained using optimal open-loop solutions as supervision signals, integrated with marching-based temporal refinement and dynamic IVP sampling. Contribution/Results: Our approach overcomes stability and generalization bottlenecks in policy training under free terminal time. It enables adaptive terminal-time adjustment across diverse tasks and broad initial state distributions, achieving total cost close to the global optimum. Empirical evaluation demonstrates significantly improved solution success rate and robustness compared to prior methods.
📝 Abstract
This paper presents a novel approach to learning free terminal time closed-loop control for robotic manipulation tasks, enabling dynamic adjustment of task duration and control inputs to enhance performance. We extend the supervised learning approach, namely solving selected optimal open-loop problems and utilizing them as training data for a policy network, to the free terminal time scenario. Three main challenges are addressed in this extension. First, we introduce a marching scheme that enhances the solution quality and increases the success rate of the open-loop solver by gradually refining time discretization. Second, we extend the QRnet in Nakamura-Zimmerer et al. (2021b) to the free terminal time setting to address discontinuity and improve stability at the terminal state. Third, we present a more automated version of the initial value problem (IVP) enhanced sampling method from previous work (Zhang et al., 2022) to adaptively update the training dataset, significantly improving its quality. By integrating these techniques, we develop a closed-loop policy that operates effectively over a broad domain with varying optimal time durations, achieving near globally optimal total costs.