SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses the challenge of achieving robust motion control for humanoid robots using only onboard sensors without access to full state observations. The authors propose a diffusion-based policy leveraging hybrid observation distillation, which employs constrained denoising, context distribution alignment, and context-aware attention masking during training. By utilizing full state supervision in simulation, the policy learns to implicitly infer motion states solely from historical sensor inputs, enabling end-to-end control without explicit state estimation. In simulation, the method achieves 99–100% success rates in velocity tracking and 93% accuracy in AMASS motion imitation. Furthermore, it demonstrates real-world deployment on the G1 humanoid robot, achieving stable walking at 50 Hz, thereby validating its consistency and practicality across simulation and physical hardware.

Technology Category

Application Category

📝 Abstract

Distilling humanoid locomotion control from offline datasets into deployable policies remains a challenge, as existing methods rely on privileged full-body states that require complex and often unreliable state estimation. We present Sensor-Conditioned Diffusion Policies (SCDP) that enables humanoid locomotion using only onboard sensors, eliminating the need for explicit state estimation. SCDP decouples sensing from supervision through mixed-observation training: diffusion model conditions on sensor histories while being supervised to predict privileged future state-action trajectories, enforcing the model to infer the motion dynamics under partial observability. We further develop restricted denoising, context distribution alignment, and context-aware attention masking to encourage implicit state estimation within the model and to prevent train-deploy mismatch. We validate SCDP on velocity-commanded locomotion and motion reference tracking tasks. In simulation, SCDP achieves near-perfect success on velocity control (99-100%) and 93% tracking success in AMASS test set, performing comparable to privileged baselines while using only onboard sensors. Finally, we deploy the trained policy on a real G1 humanoid at 50 Hz, demonstrating robust real robot locomotion without external sensing or state estimation.

Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion

partial observability

sensor-based control

state estimation

offline imitation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sensor-Conditioned Diffusion Policies

Mixed-Observation Distillation

Implicit State Estimation