Learning to Recover: Dynamic Reward Shaping with Wheel-Leg Coordination for Fallen Robots

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Legged-wheel robots face significant challenges in post-fall adaptive recovery, including policy fragility, oversimplified dynamics modeling, and sparse reward signals. This paper proposes a curriculum-learning-based dynamic reward shaping framework to jointly optimize diverse recovery motion exploration and high-precision pose refinement. We introduce the first episode-level dynamic reward shaping mechanism; design an asymmetric Actor-Critic architecture that incorporates privileged simulation information to enhance policy generalization; and uncover the wheel-leg cooperative energy transfer principle to reduce joint loading. Integrating noise-robust observation modeling with Sim2Real transfer training, our method achieves up to 99.1% recovery success rate on two quadrupedal legged-wheel platforms, reduces peak joint torque by 15.8–26.2%, and eliminates the need for platform-specific hyperparameter tuning.

Technology Category

Application Category

📝 Abstract
Adaptive recovery from fall incidents are essential skills for the practical deployment of wheeled-legged robots, which uniquely combine the agility of legs with the speed of wheels for rapid recovery. However, traditional methods relying on preplanned recovery motions, simplified dynamics or sparse rewards often fail to produce robust recovery policies. This paper presents a learning-based framework integrating Episode-based Dynamic Reward Shaping and curriculum learning, which dynamically balances exploration of diverse recovery maneuvers with precise posture refinement. An asymmetric actor-critic architecture accelerates training by leveraging privileged information in simulation, while noise-injected observations enhance robustness against uncertainties. We further demonstrate that synergistic wheel-leg coordination reduces joint torque consumption by 15.8% and 26.2% and improves stabilization through energy transfer mechanisms. Extensive evaluations on two distinct quadruped platforms achieve recovery success rates up to 99.1% and 97.8% without platform-specific tuning. The supplementary material is available at https://boyuandeng.github.io/L2R-WheelLegCoordination/
Problem

Research questions and friction points this paper is trying to address.

Enabling wheel-legged robots to recover from falls robustly
Overcoming limitations of preplanned motions and sparse rewards
Optimizing wheel-leg coordination for energy-efficient recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Reward Shaping balances exploration and refinement
Asymmetric actor-critic uses privileged simulation information
Wheel-leg coordination reduces torque and improves stabilization
🔎 Similar Papers
No similar papers found.
B
Boyuan Deng
Humanoids and Human-Centered Mechatronics (HHCM), Istituto Italiano di Tecnologia, Via Morego 30, Genoa, 16163, Italy; Ph.D. Program of National Interest in Robotics and Intelligent Machines (DRIM), University of Genova, 16126 Genoa, Italy
Luca Rossini
Luca Rossini
Associate Professor in Statistics - University of Milan
Bayesian nonparametricsEconometricsEnergyForecastingCopula Models
J
Jin Wang
Humanoids and Human-Centered Mechatronics (HHCM), Istituto Italiano di Tecnologia, Via Morego 30, Genoa, 16163, Italy
Weijie Wang
Weijie Wang
PhD Student, Zhejiang University
Computer VisionEfficient AIDeep Learning
N
Nikolaos Tsagarakis
Humanoids and Human-Centered Mechatronics (HHCM), Istituto Italiano di Tecnologia, Via Morego 30, Genoa, 16163, Italy