Safe Obstacle-Free Guidance of Space Manipulators in Debris Removal Missions via Deep Reinforcement Learning

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Space manipulator arms operating from a free-floating base face significant challenges in capturing non-cooperative space debris, including low capture accuracy and high risks of self-collision or unintended target contact. Method: This paper proposes a model-free operational-space trajectory planning approach based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) reinforcement learning framework. A curriculum-based multi-critic network architecture is designed to jointly optimize capture-point tracking accuracy and multi-constraint obstacle avoidance. Prioritized experience replay is incorporated to enhance training stability, while local singularity avoidance and dexterity-enhancing control strategies are integrated for robust execution. Results: Evaluated on a MATLAB/Simulink simulation platform with a 7-DOF manipulator, the method autonomously generates safe, continuous, and real-time operational-space trajectories. It achieves high-precision capture-point tracking, complete self-collision avoidance, and non-contact target approach during dynamic pursuit—substantially improving safety and robustness in active debris removal missions.

Technology Category

Application Category

📝 Abstract

The objective of this study is to develop a model-free workspace trajectory planner for space manipulators using a Twin Delayed Deep Deterministic Policy Gradient (TD3) agent to enable safe and reliable debris capture. A local control strategy with singularity avoidance and manipulability enhancement is employed to ensure stable execution. The manipulator must simultaneously track a capture point on a non-cooperative target, avoid self-collisions, and prevent unintended contact with the target. To address these challenges, we propose a curriculum-based multi-critic network where one critic emphasizes accurate tracking and the other enforces collision avoidance. A prioritized experience replay buffer is also used to accelerate convergence and improve policy robustness. The framework is evaluated on a simulated seven-degree-of-freedom KUKA LBR iiwa mounted on a free-floating base in Matlab/Simulink, demonstrating safe and adaptive trajectory generation for debris removal missions.

Problem

Research questions and friction points this paper is trying to address.

Develop model-free trajectory planner for space manipulators using TD3

Ensure safe debris capture while avoiding collisions and singularities

Enable adaptive trajectory generation for non-cooperative space targets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-free trajectory planner using TD3 agent

Curriculum-based multi-critic network for collision avoidance

Prioritized experience replay buffer for accelerated convergence

🔎 Similar Papers

No similar papers found.