🤖 AI Summary
This work addresses the challenges of strong nonlinear coupling and insufficient control degrees of freedom in target tracking for an underactuated airship equipped only with two thrusters and a movable slider. To overcome these limitations, the authors propose a hierarchical reinforcement learning framework that explicitly decouples centroid reconfiguration from thrust control: an outer-loop policy optimizes task-driven centroid placement, while an inner-loop policy generates thrust commands to track straight-line trajectories. The architecture integrates a two-stage training strategy with a nonlinear dynamics model and includes a convergence analysis. Experimental results on a test set of 27 targets demonstrate that the proposed method significantly outperforms both a fixed-centroid baseline and a PID controller, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer capability.
📝 Abstract
This paper investigates goal-directed tracking control of underactuated blimps with center-of-mass (CoM) reconfiguration. Unlike conventional overactuated blimp designs that rely on redundant actuation for simplified control, this paper focuses on a compact architecture consisting of two thrusters and a movable internal slider, aiming to improve energy efficiency and payload capacity. This hardware-efficient configuration introduces significant underactuation and strong nonlinear coupling between CoM dynamics and vehicle motion. To address these challenges, this paper proposes a bi-level reinforcement learning framework that explicitly decouples task-level CoM planning from continuous thrust control. The outer policy determines a target-dependent CoM configuration prior to flight, while the inner policy generates thrust commands to track straight-line references. To ensure stable learning, this paper introduces a two-stage learning strategy, supported by a convergence analysis of the resulting bi-level process. Extensive simulations and real-world experiments on a 27-goal evaluation set demonstrate that the proposed method consistently outperforms fixed-CoM baselines and PID-based controllers, achieving higher tracking accuracy, enhanced robustness, and reliable sim-to-real transfer.