A causal learning approach to in-orbit inertial parameter estimation for multi-payload deployers

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Real-time inertial parameter estimation for on-orbit multi-payload deployers remains challenging under conditions of unknown initial states and dynamically changing configurations. Method: This paper proposes an online identification method integrating causal learning with proximal policy optimization (PPO)-based reinforcement learning. It constructs an excitation–response causal model, designs a configuration classifier leveraging time-series clustering and dynamic time warping (DTW), and establishes an automated framework for optimizing excitation sequences—enabling high-accuracy, initialization-free parameter estimation and real-time validation of configuration jumps. Contribution/Results: Experimental results demonstrate a >23% improvement in configuration classification accuracy and a fivefold increase in optimal excitation search efficiency. To the best of our knowledge, this is the first work to synergistically combine causal learning and PPO for spacecraft inertial parameter online estimation, establishing a novel paradigm for autonomous on-orbit deployment.

Technology Category

Application Category

📝 Abstract

This paper discusses an approach to inertial parameter estimation for the case of cargo carrying spacecraft that is based on causal learning, i.e. learning from the responses of the spacecraft, under actuation. Different spacecraft configurations (inertial parameter sets) are simulated under different actuation profiles, in order to produce an optimised time-series clustering classifier that can be used to distinguish between them. The actuation is comprised of finite sequences of constant inputs that are applied in order, based on typical actuators available. By learning from the system's responses across multiple input sequences, and then applying measures of time-series similarity and F1-score, an optimal actuation sequence can be chosen either for one specific system configuration or for the overall set of possible configurations. This allows for both estimation of the inertial parameter set without any prior knowledge of state, as well as validation of transitions between different configurations after a deployment event. The optimisation of the actuation sequence is handled by a reinforcement learning model that uses the proximal policy optimisation (PPO) algorithm, by repeatedly trying different sequences and evaluating the impact on classifier performance according to a multi-objective metric.

Problem

Research questions and friction points this paper is trying to address.

Multi-payload Deployer

Parameter Calculation

State Verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Learning

Reinforcement Learning

Adaptive Control

🔎 Similar Papers

No similar papers found.