BEACON: Cross-Domain Co-Training of Generative Robot Policies via Best-Effort Adaptation

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the challenge of effectively leveraging abundant source-domain data to train generalizable robotic policies when only a limited number of target-domain demonstrations are available. The authors propose a discrepancy-aware importance reweighting co-training method that jointly learns a diffusion-based visuomotor policy and sample-level weights for source-domain data. This approach implicitly aligns cross-domain features without requiring explicit alignment, while enabling balanced utilization of multiple source domains and scalable adaptation to high-dimensional sequential policies. Experimental results demonstrate that the proposed method significantly outperforms baselines—including those using only target data, fixed-ratio source–target mixing, or explicit feature alignment—across simulation-to-simulation, simulation-to-real, and multi-source manipulation tasks, achieving markedly improved policy robustness and data efficiency.

📝 Abstract

We introduce BEACON--Best-Effort Adaptation for Cross-Domain Co-Training--a theory-driven framework for training generative robot policies with abundant source demonstrations and limited target demonstrations. BEACON casts cross-domain co-training as a discrepancy-aware importance-reweighting problem, jointly learning a diffusion-based visuomotor policy and per-sample source weights that minimize an objective informed by target-domain generalization guarantees. To make best-effort adaptation practical for high-dimensional sequence policies, we develop scalable instance-level discrepancy estimators, stochastic alternating updates for policy and weights, and a multi-source extension that balances heterogeneous source domains. Across sim-to-sim, sim-to-real, and multi-source manipulation settings, BEACON improves robustness and data efficiency over target-only, fixed-ratio co-training, and feature-alignment baselines. Importantly, even without an explicit alignment objective, BEACON achieves feature alignment as an implicit result of discrepancy-aware cross-domain co-training.

Problem

Research questions and friction points this paper is trying to address.

cross-domain adaptation

robot policy learning

generative policies

domain discrepancy

data efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-domain co-training

discrepancy-aware reweighting

diffusion-based policy

instance-level discrepancy estimation

implicit feature alignment

🔎 Similar Papers

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

2024-10-10arXiv.orgCitations: 0