Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Space manipulator arms operating from a free-floating base face significant challenges in capturing non-cooperative space debris, including low capture accuracy and high risks of self-collision or unintended target contact. Method: This paper proposes a model-free operational-space trajectory planning approach based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning framework. A curriculum-based multi-critic network architecture is designed to jointly optimize capture-point tracking accuracy and multi-constraint obstacle avoidance. Prioritized experience replay is incorporated to enhance training stability, while local singularity avoidance and dexterity-enhancing control strategies are integrated for robust execution. Results: Evaluated on a MATLAB/Simulink simulation platform with a 7-DOF manipulator, the method autonomously generates safe, continuous, and real-time operational-space trajectories. It achieves high-precision capture-point tracking, complete self-collision avoidance, and non-contact target approach during dynamic pursuit—substantially improving safety and robustness in active debris removal missions.

Technology Category

Application Category

📝 Abstract
The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.
Problem

Research questions and friction points this paper is trying to address.

Develop model-free trajectory planner for space manipulators using TD3
Ensure safe debris capture while avoiding collisions and singularities
Enable adaptive trajectory generation for non-cooperative space targets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-free trajectory planner using TD3 agent
Curriculum-based multi-critic network for collision avoidance
Prioritized experience replay buffer for accelerated convergence
🔎 Similar Papers
No similar papers found.
V
Vincent Lam
Embodied Learning and Intelligence for eXploration and Innovative soft Robotics (ELIXIR) Lab, Toronto Metropolitan University, Toronto, ON, Canada
Robin Chhabra
Robin Chhabra
Professor of Robotics & Mechatronics, Toronto Metropolitan University
Soft RoboticsEmbodied AIMulti-Robot SystemsRobotic Self PerceptionGeometric Mechanics