A causal learning approach to in-orbit inertial parameter estimation for multi-payload deployers

📅 2025-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-time inertial parameter estimation for on-orbit multi-payload deployers remains challenging under conditions of unknown initial states and dynamically changing configurations. Method: This paper proposes an online identification method integrating causal learning with proximal policy optimization (PPO)-based reinforcement learning. It constructs an excitation–response causal model, designs a configuration classifier leveraging time-series clustering and dynamic time warping (DTW), and establishes an automated framework for optimizing excitation sequences—enabling high-accuracy, initialization-free parameter estimation and real-time validation of configuration jumps. Contribution/Results: Experimental results demonstrate a >23% improvement in configuration classification accuracy and a fivefold increase in optimal excitation search efficiency. To the best of our knowledge, this is the first work to synergistically combine causal learning and PPO for spacecraft inertial parameter online estimation, establishing a novel paradigm for autonomous on-orbit deployment.

Technology Category

Application Category

📝 Abstract
This paper discusses an approach to inertial parameter estimation for the case of cargo carrying spacecraft that is based on causal learning, i.e. learning from the responses of the spacecraft, under actuation. Different spacecraft configurations (inertial parameter sets) are simulated under different actuation profiles, in order to produce an optimised time-series clustering classifier that can be used to distinguish between them. The actuation is comprised of finite sequences of constant inputs that are applied in order, based on typical actuators available. By learning from the system's responses across multiple input sequences, and then applying measures of time-series similarity and F1-score, an optimal actuation sequence can be chosen either for one specific system configuration or for the overall set of possible configurations. This allows for both estimation of the inertial parameter set without any prior knowledge of state, as well as validation of transitions between different configurations after a deployment event. The optimisation of the actuation sequence is handled by a reinforcement learning model that uses the proximal policy optimisation (PPO) algorithm, by repeatedly trying different sequences and evaluating the impact on classifier performance according to a multi-objective metric.
Problem

Research questions and friction points this paper is trying to address.

Multi-payload Deployer
Parameter Calculation
State Verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Learning
Reinforcement Learning
Adaptive Control
🔎 Similar Papers
No similar papers found.