🤖 AI Summary
Space manipulator arms operating from a free-floating base face significant challenges in capturing non-cooperative space debris, including low capture accuracy and high risks of self-collision or unintended target contact.
Method: This paper proposes a model-free operational-space trajectory planning approach based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning framework. A curriculum-based multi-critic network architecture is designed to jointly optimize capture-point tracking accuracy and multi-constraint obstacle avoidance. Prioritized experience replay is incorporated to enhance training stability, while local singularity avoidance and dexterity-enhancing control strategies are integrated for robust execution.
Results: Evaluated on a MATLAB/Simulink simulation platform with a 7-DOF manipulator, the method autonomously generates safe, continuous, and real-time operational-space trajectories. It achieves high-precision capture-point tracking, complete self-collision avoidance, and non-contact target approach during dynamic pursuit—substantially improving safety and robustness in active debris removal missions.
📝 Abstract
The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.