Bi-Level Reinforcement Learning Control for an Underactuated Blimp via Center-of-Mass Reconfiguration

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenges of strong nonlinear coupling and insufficient control degrees of freedom in target tracking for an underactuated airship equipped only with two thrusters and a movable slider. To overcome these limitations, the authors propose a hierarchical reinforcement learning framework that explicitly decouples centroid reconfiguration from thrust control: an outer-loop policy optimizes task-driven centroid placement, while an inner-loop policy generates thrust commands to track straight-line trajectories. The architecture integrates a two-stage training strategy with a nonlinear dynamics model and includes a convergence analysis. Experimental results on a test set of 27 targets demonstrate that the proposed method significantly outperforms both a fixed-centroid baseline and a PID controller, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer capability.

📝 Abstract

This paper investigates goal-directed tracking control of underactuated blimps with center-of-mass (CoM) reconfiguration. Unlike conventional overactuated blimp designs that rely on redundant actuation for simplified control, this paper focuses on a compact architecture consisting of two thrusters and a movable internal slider, aiming to improve energy efficiency and payload capacity. This hardware-efficient configuration introduces significant underactuation and strong nonlinear coupling between CoM dynamics and vehicle motion. To address these challenges, this paper proposes a bi-level reinforcement learning framework that explicitly decouples task-level CoM planning from continuous thrust control. The outer policy determines a target-dependent CoM configuration prior to flight, while the inner policy generates thrust commands to track straight-line references. To ensure stable learning, this paper introduces a two-stage learning strategy, supported by a convergence analysis of the resulting bi-level process. Extensive simulations and real-world experiments on a 27-goal evaluation set demonstrate that the proposed method consistently outperforms fixed-CoM baselines and PID-based controllers, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer.

Problem

Research questions and friction points this paper is trying to address.

underactuated blimp

center-of-mass reconfiguration

goal-directed tracking control

nonlinear coupling

hardware-efficient configuration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bi-Level Reinforcement Learning

Underactuated Blimp

Center-of-Mass Reconfiguration