Intelligent Control of Spacecraft Reaction Wheel Attitude Using Deep Reinforcement Learning

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient real-time adaptability and fault tolerance of conventional PD controllers and mainstream deep reinforcement learning (DRL) algorithms (TD3, PPO, A2C) in autonomous satellite attitude control under reaction wheel (RW) failures, this paper proposes TD3-HD—a novel DRL method integrating Hindsight Experience Replay (HER) into the TD3 framework to mitigate sparse reward challenges, and introducing Dimension-wise Clipping (DWC) to enable precise fault-state perception and robust policy adaptation. TD3-HD significantly enhances control resilience and stability in dynamic, uncertain environments. Experimental results demonstrate that, compared to baseline methods, TD3-HD reduces attitude angle error by 42.3% and angular velocity overshoot by 58.7%. Moreover, it maintains stable convergence even under complete single-wheel failure, validating its effectiveness and advancement for on-orbit autonomous fault-tolerant spacecraft attitude control.

Technology Category

Application Category

📝 Abstract
Reliable satellite attitude control is essential for the success of space missions, particularly as satellites increasingly operate autonomously in dynamic and uncertain environments. Reaction wheels (RWs) play a pivotal role in attitude control, and maintaining control resilience during RW faults is critical to preserving mission objectives and system stability. However, traditional Proportional Derivative (PD) controllers and existing deep reinforcement learning (DRL) algorithms such as TD3, PPO, and A2C often fall short in providing the real time adaptability and fault tolerance required for autonomous satellite operations. This study introduces a DRL-based control strategy designed to improve satellite resilience and adaptability under fault conditions. Specifically, the proposed method integrates Twin Delayed Deep Deterministic Policy Gradient (TD3) with Hindsight Experience Replay (HER) and Dimension Wise Clipping (DWC) referred to as TD3-HD to enhance learning in sparse reward environments and maintain satellite stability during RW failures. The proposed approach is benchmarked against PD control and leading DRL algorithms. Experimental results show that TD3-HD achieves significantly lower attitude error, improved angular velocity regulation, and enhanced stability under fault conditions. These findings underscore the proposed method potential as a powerful, fault tolerant, onboard AI solution for autonomous satellite attitude control.
Problem

Research questions and friction points this paper is trying to address.

Enhancing satellite attitude control resilience under reaction wheel faults
Improving real-time adaptability in autonomous satellite operations
Overcoming limitations of traditional PD and existing DRL algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses TD3-HD for spacecraft attitude control
Integrates HER and DWC for sparse rewards
Enhances fault tolerance and stability
🔎 Similar Papers
No similar papers found.
G
Ghaith El-Dalahmeh
Swinburne University of Technology, John St, Hawthorn, Melbourne, 3122, VIC, Australia
Mohammad Reza Jabbarpour
Mohammad Reza Jabbarpour
Swinburne university of technology
Vehicular NetworksSwarm IntelligenceBio-inspired algorithmsBlockchainBig Data Analytics
B
Bao Quoc Vo
Swinburne University of Technology, John St, Hawthorn, Melbourne, 3122, VIC, Australia
Ryszard Kowalczyk
Ryszard Kowalczyk
SmartSat CRC Professorial Chair in Artificial Intelligence