Deep Reinforcement Learning Policies for Underactuated Satellite Attitude Control

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the rapid large-angle attitude reorientation problem for underactuated satellites suffering single-actuator failure. We propose a customized Proximal Policy Optimization (PPO)-based deep reinforcement learning method. A high-fidelity dynamical model and an adaptive reward function are developed to enable full-state autonomous pointing control in the inertial frame, supporting both nominal operation and single-axis failure scenarios. To our knowledge, this is the first application of PPO to underactuated satellite attitude control, explicitly designed to ensure policy robustness and reliable sim-to-real transfer. Experimental validation on a hardware-in-the-loop testbed and a real nanosatellite platform demonstrates industrial-grade pointing accuracy (<0.1°), rapid convergence during large-angle maneuvers, and strong generalization across diverse initial conditions and fault configurations. The approach significantly enhances autonomous fault-tolerant attitude control capability.

Technology Category

Application Category

📝 Abstract
Autonomy is a key challenge for future space exploration endeavours. Deep Reinforcement Learning holds the promises for developing agents able to learn complex behaviours simply by interacting with their environment. This paper investigates the use of Reinforcement Learning for the satellite attitude control problem, namely the angular reorientation of a spacecraft with respect to an in- ertial frame of reference. In the proposed approach, a set of control policies are implemented as neural networks trained with a custom version of the Proximal Policy Optimization algorithm to maneuver a small satellite from a random starting angle to a given pointing target. In particular, we address the problem for two working conditions: the nominal case, in which all the actuators (a set of 3 reac- tion wheels) are working properly, and the underactuated case, where an actuator failure is simulated randomly along with one of the axes. We show that the agents learn to effectively perform large-angle slew maneuvers with fast convergence and industry-standard pointing accuracy. Furthermore, we test the proposed method on representative hardware, showing that by taking adequate measures controllers trained in simulation can perform well in real systems.
Problem

Research questions and friction points this paper is trying to address.

Develops DRL policies for satellite attitude control
Addresses underactuated conditions with actuator failures
Validates simulation-trained controllers on real hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning for satellite attitude control
Proximal Policy Optimization with custom modifications
Neural networks managing underactuated and nominal conditions
🔎 Similar Papers
No similar papers found.