$\chi_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

📅 2026-02-09
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses error accumulation in long-horizon robotic manipulation caused by distributional mismatches among human demonstrations, policy training, and test-time execution. To this end, the authors propose the χ₀ framework, which fuses multi-source demonstration distributions via model arithmetic, introduces a phase-aware advantage estimator to deliver stable reward signals, and aligns training and deployment distributions through spatiotemporal augmentation, heuristic DAgger, and temporal chunk smoothing. This approach is the first to systematically identify and mitigate this triple distribution shift. Evaluated under resource constraints—using only 20 hours of demonstration data and eight A100 GPUs—the method achieves a task success rate 3.5× higher than the state-of-the-art π₀.₅ baseline, enabling robust 24-hour autonomous operation of a dual-arm robot for complex tasks such as garment unfolding, folding, and hanging.

Technology Category

Application Category

📝 Abstract
High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose $\chi_{0}$, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. $\chi_{0}$ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that $\chi_{0}$ surpasses the state-of-the-art $\pi_{0.5}$ in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.
Problem

Research questions and friction points this paper is trying to address.

distributional shift
robust manipulation
long-horizon robotic tasks
compounding errors
real-world robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributional Consistency
Model Arithmetic
Stage Advantage
Train-Deploy Alignment
Resource-Efficient Robotic Manipulation
Checheng Yu
Checheng Yu
Nanjing University
RoboticsRL
Chonghao Sima
Chonghao Sima
HKU | OpenDriveLab
computer visionautonomous driving
G
Gangcheng Jiang
Kinetix AI
H
Hai Zhang
Kinetix AI
H
Haoguang Mai
Kinetix AI
H
Hongyang Li
Kinetix AI
Huijie Wang
Huijie Wang
OpenDriveLab
Embodied AIScene Understanding
J
Jin Chen
Kinetix AI
K
Kaiyang Wu
Kinetix AI
L
Li Chen
Kinetix AI
L
Lirui Zhao
Kinetix AI
Modi Shi
Modi Shi
Beihang University
embodied ai
Ping Luo
Ping Luo
National University of Defense Technology
distributed_computing
Qingwen Bu
Qingwen Bu
HKU | OpenDriveLab
Robot LearningComputer VisionMachine Learning
S
Shijia Peng
Kinetix AI
T
Tianyu Li
Kinetix AI
Y
Yibo Yuan
Kinetix AI