MorFiC: Fixing Value Miscalibration for Zero-Shot Quadruped Transfer

📅 2026-03-15

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Zero-shot policy transfer across diverse quadrupedal robot morphologies often fails due to value estimation bias arising from a shared critic network. This work proposes MorFiC, the first approach to introduce a morphology-aware modulation mechanism on the critic side, conditioned on each robot’s physical and control parameters, to calibrate advantage estimation by conditioning the value function on morphology-specific features. This addresses the incompatibility of value targets during multi-morphology training. With only a single policy, MorFiC enables stable and efficient zero-shot transfer across unseen morphologies, achieving 16.1%, approximately 2×, and 5× improvements in locomotion speed on A1, Cheetah, and B1 robots, respectively. The method further demonstrates successful zero-shot deployment on Unitree Go1 and Go2 without any fine-tuning.

Technology Category

Application Category

📝 Abstract

Generalizing learned locomotion policies across quadrupedal robots with different morphologies remain a challenge. Policies trained on a single robot often break when deployed on embodiments with different mass distributions, kinematics, joint limits, or actuation constraints, forcing per robot retraining. We present MorFiC, a reinforcement learning approach for zero-shot cross-morphology locomotion using a single shared policy. MorFiC resolves a key failure mode in multi-morphology actor-critic training: a shared critic tends to average incompatible value targets across embodiments, yielding miscalibrated advantages. To address this, MorFiC conditions the critic via morphology-aware modulation driven by robot physical and control parameters, generating morphology-specific value estimates within a shared network. Trained with a single source robot with morphology randomization in simulation, MorFiC can transfer to unseen robots and surpasses morphology-conditioned PPO baselines by improving stable average speed and longest stable run on multiple targets, including speed gains of +16.1% on A1, ~2x on Cheetah, and ~5x on B1. We additionally show that MorFiC reduces the value-prediction error variance across morphologies and stabilizes the advantage estimates, demonstrating that the improved value-function calibration corresponds to a stronger transfer performance. Finally, we demonstrate zero-shot deployment on two Unitree Go1 and Go2 robots without fine-tuning, indicating that critic-side conditioning is a practical approach for cross-morphology generalization.

Problem

Research questions and friction points this paper is trying to address.

zero-shot transfer

quadruped locomotion

cross-morphology generalization

value miscalibration

morphology adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot transfer

cross-morphology generalization

value miscalibration