🤖 AI Summary
This work addresses the lift oscillations and instability caused by six-degree-of-freedom fluid coupling in bio-inspired quadrupedal underwater propulsion. To balance propulsive efficiency and motion stability, the authors propose a safety-aware reinforcement learning approach that formulates quadrupedal swimming as a constrained optimization problem. They develop an Accelerated Constrained Proximal Policy Optimization algorithm with PID control (ACPPO-PID), where PID-regulated Lagrange multipliers enforce safety constraints. Training stability and convergence speed are enhanced through conditional asymmetric clipping and periodic geometric aggregation. Sim-to-real transfer is achieved via imitation learning. Experimental results demonstrate that the proposed method significantly improves thrust efficiency, effectively suppresses destabilizing disturbances, and exhibits strong robustness and generalization in free-swimming tasks on a real quadrupedal robot.
📝 Abstract
Bio-inspired aquatic propulsion offers high thrust and maneuverability but is prone to destabilizing forces such as lift fluctuations, which are further amplified by six-degree-of-freedom (6-DoF) fluid coupling. We formulate quadrupedal swimming as a constrained optimization problem that maximizes forward thrust while minimizing destabilizing fluctuations. Our proposed framework, Accelerated Constrained Proximal Policy Optimization with a PID-regulated Lagrange multiplier (ACPPO-PID), enforces constraints with a PID-regulated Lagrange multiplier, accelerates learning via conditional asymmetric clipping, and stabilizes updates through cycle-wise geometric aggregation. Initialized with imitation learning and refined through on-hardware towing-tank experiments, ACPPO-PID produces control policies that transfer effectively to quadrupedal free-swimming trials. Results demonstrate improved thrust efficiency, reduced destabilizing forces, and faster convergence compared with state-of-the-art baselines, underscoring the importance of constraint-aware safe RL for robust and generalizable bio-inspired locomotion in complex fluid environments.