🤖 AI Summary
Soft-bodied snake-like robots exhibit highly nonlinear dynamics coupled strongly with environmental interactions, making control exceptionally challenging; existing model-based or bio-inspired approaches rely on oversimplified assumptions, while online deep reinforcement learning (DRL) incurs prohibitive cost and safety risks in real-world deployment. To address these limitations, we propose DiSA-IQL—a distribution shift-aware offline RL framework built upon implicit Q-learning (IQL). It integrates behavior cloning for policy initialization, conservative Q-function optimization, and a novel state-action reliability discrimination mechanism that actively suppresses policy updates from out-of-distribution (OOD) samples. In simulation experiments, DiSA-IQL achieves significantly higher task success rates, improved trajectory smoothness, and enhanced cross-scenario robustness, consistently outperforming baseline methods including behavior cloning (BC), conservative Q-learning (CQL), and standard IQL.
📝 Abstract
Soft snake robots offer remarkable flexibility and adaptability in complex environments, yet their control remains challenging due to highly nonlinear dynamics. Existing model-based and bio-inspired controllers rely on simplified assumptions that limit performance. Deep reinforcement learning (DRL) has recently emerged as a promising alternative, but online training is often impractical because of costly and potentially damaging real-world interactions. Offline RL provides a safer option by leveraging pre-collected datasets, but it suffers from distribution shift, which degrades generalization to unseen scenarios. To overcome this challenge, we propose DiSA-IQL (Distribution-Shift-Aware Implicit Q-Learning), an extension of IQL that incorporates robustness modulation by penalizing unreliable state-action pairs to mitigate distribution shift. We evaluate DiSA-IQL on goal-reaching tasks across two settings: in-distribution and out-of-distribution evaluation. Simulation results show that DiSA-IQL consistently outperforms baseline models, including Behavior Cloning (BC), Conservative Q-Learning (CQL), and vanilla IQL, achieving higher success rates, smoother trajectories, and improved robustness. The codes are open-sourced to support reproducibility and to facilitate further research in offline RL for soft robot control.