Autonomous Reasoning for Spacecraft Control: A Large Language Model Framework with Group Relative Policy Optimization

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work proposes a novel approach that integrates large language models (LLMs) with Group Relative Policy Optimization (GRPO) to address the dual challenges of interpretability and stability in autonomous spacecraft control within complex dynamic environments. By employing a two-stage training process—supervised fine-tuning followed by interactive reinforcement learning—the method introduces an explicit reasoning mechanism into GRPO for the first time, establishing a unified control framework applicable to both linear and nonlinear systems. The framework not only generates stable and feasible control policies, as demonstrated in three-dimensional spacecraft attitude control tasks, but also produces human-readable explanations of its decisions. These capabilities underscore its potential for deployment in safety-critical systems where transparency and reliability are paramount.

Technology Category

Application Category

📝 Abstract

This paper presents a learning-based guidance-and-control approach that couples a reasoning-enabled Large Language Model (LLM) with Group Relative Policy Optimization (GRPO). A two-stage procedure consisting of Supervised Fine-Tuning (SFT) to learn formatting and control primitives, followed by GRPO for interaction-driven policy improvement, trains controllers for each environment. The framework is demonstrated on four control problems spanning a gradient of dynamical complexity, from canonical linear systems through nonlinear oscillatory dynamics to three-dimensional spacecraft attitude control with gyroscopic coupling and thrust constraints. Results demonstrate that an LLM with explicit reasoning, optimized via GRPO, can synthesize feasible stabilizing policies under consistent training settings across both linear and nonlinear systems. The two-stage training methodology enables models to generate control sequences while providing human-readable explanations of their decision-making process. This work establishes a foundation for applying GRPO-based reasoning to autonomous control systems, with potential applications in aerospace and other safety-critical domains.

Problem

Research questions and friction points this paper is trying to address.

Autonomous Reasoning

Spacecraft Control

Large Language Model

Nonlinear Dynamics

Safety-Critical Systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model

Group Relative Policy Optimization

Autonomous Control