EBT-Policy: Energy Unlocks Emergent Physical Reasoning Capabilities

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative policies (e.g., Diffusion Policy) suffer from high computational overhead, exposure bias, and inference instability, leading to performance degradation under distributional shift. This paper introduces EBT-Policy—the first policy architecture leveraging Scalable Energy-Based Transformers (EBT) for embodied agent decision-making. It learns energy potential fields end-to-end, enabling equilibrium-based inference, uncertainty-awareness, and dynamic computational allocation, with decision-making completed in just two inference steps. Compared to diffusion-based policies, EBT-Policy significantly reduces training and inference costs in both simulation and real-robot tasks, achieves zero-shot failure recovery, and demonstrates superior robustness under distributional shift. Its core innovation lies in the first successful application of EBT to physically grounded policy learning—uniquely balancing efficiency, inference stability, and generalization capability.

Technology Category

Application Category

📝 Abstract
Implicit policies parameterized by generative models, such as Diffusion Policy, have become the standard for policy learning and Vision-Language-Action (VLA) models in robotics. However, these approaches often suffer from high computational cost, exposure bias, and unstable inference dynamics, which lead to divergence under distribution shifts. Energy-Based Models (EBMs) address these issues by learning energy landscapes end-to-end and modeling equilibrium dynamics, offering improved robustness and reduced exposure bias. Yet, policies parameterized by EBMs have historically struggled to scale effectively. Recent work on Energy-Based Transformers (EBTs) demonstrates the scalability of EBMs to high-dimensional spaces, but their potential for solving core challenges in physically embodied models remains underexplored. We introduce a new energy-based architecture, EBT-Policy, that solves core issues in robotic and real-world settings. Across simulated and real-world tasks, EBT-Policy consistently outperforms diffusion-based policies, while requiring less training and inference computation. Remarkably, on some tasks it converges within just two inference steps, a 50x reduction compared to Diffusion Policy's 100. Moreover, EBT-Policy exhibits emergent capabilities not seen in prior models, such as zero-shot recovery from failed action sequences using only behavior cloning and without explicit retry training. By leveraging its scalar energy for uncertainty-aware inference and dynamic compute allocation, EBT-Policy offers a promising path toward robust, generalizable robot behavior under distribution shifts.
Problem

Research questions and friction points this paper is trying to address.

Addresses computational cost and instability in robot policy learning
Improves robustness against distribution shifts in physical environments
Enables zero-shot recovery from failures without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

EBT-Policy uses energy-based transformers for robotics
It reduces inference steps significantly compared to diffusion
Enables zero-shot recovery without explicit retry training
🔎 Similar Papers
No similar papers found.