🤖 AI Summary
To address the weak generalization and difficulty in modeling physical interactions for embodied agents performing long-horizon household tasks in simulation, this paper proposes an end-to-end solution tailored to the BEHAVIOR-1K Challenge. Methodologically, we build a multi-stage curriculum learning framework based on the π₀.₅ architecture, incorporating task-oriented data augmentation and ablation-driven training optimization to systematically characterize scaling laws across pretraining and post-training phases. Our core contribution is the first methodology for training large language–vision–action models explicitly adapted to complex physical interactions, significantly improving robustness and generalization on long-horizon, multi-step, embodied manipulation tasks. Evaluated on the 2025 BEHAVIOR Challenge, our approach secured second place—substantially outperforming all other competitors—demonstrating both effectiveness and strong reusability across embodied AI benchmarks.
📝 Abstract
The 2025 BEHAVIOR Challenge is designed to rigorously track progress toward solving long-horizon tasks by physical agents in simulated environments. BEHAVIOR-1K focuses on everyday household tasks that people most want robots to assist with and these tasks introduce long-horizon mobile manipulation challenges in realistic settings, bridging the gap between current research and real-world, human-centric applications. This report presents our solution to the 2025 BEHAVIOR Challenge in a very close 2nd place and substantially outperforms the rest of the submissions. Building on $π_{0.5}$, we focus on systematically building our solution by studying the effects of training techniques and data. Through careful ablations, we show the scaling power in pre-training and post-training phases for competitive performance. We summarize our practical lessons and design recommendations that we hope will provide actionable insights for the broader embodied AI community when adapting powerful foundation models to complex embodied scenarios.