Evaluating Robot Policies in a World Model

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world policy evaluation for robotics is costly, while hand-crafted simulators often suffer from low fidelity. Method: We propose World-Model-driven Policy Evaluation (WPE), a novel framework that replaces the real environment with an action-conditioned video generation model. It employs a first-of-its-kind block-wise autoregressive diffusion Transformer for efficient, high-fidelity video synthesis and leverages a vision-language model (VLM) to construct a generalizable reward function. Contribution/Results: We introduce an action-consistency metric to quantify world-model fidelity. Empirical results show that although WPE exhibits absolute performance bias, it robustly preserves relative policy rankings. On robotic arm motion tasks, WPE generates video-level-fidelity sequences that accurately discriminate policy quality, establishing a low-cost, scalable, and highly correlated simulation-based evaluation paradigm for pre-deployment validation.

Technology Category

Application Category

📝 Abstract
Robotics has broad applications from automating house chores to taking care of patients. However, evaluating robot control policies is challenging, as real-world testing is expensive, while handcrafted simulations often fail to accurately reflect real-world conditions, resulting in poor correlation between simulated evaluation and real-world outcomes. In this work, we investigate World-model-based Policy Evaluation (WPE). We first train an action-conditioned video generation model as a proxy to real-world environments. To enable efficient rollouts of hundreds of interactive steps while mitigating error accumulation in the world model, we propose an inference scheme which we call Blockwise-Autoregressive Diffusion Transformer with adjustable context and decoding horizon lengths. To ensure that the world model indeed follows action input, we propose metrics based on the agreement between the ground truth video and generated video conditioned on the same sequence of actions to evaluate the world model. We then use the world model for policy evaluation by performing Monte Carlo rollouts in the world model while employing a vision-language model (VLM) as a reward function. Interestingly, we found that WPE tends to underestimate the policy values for in-distribution actions and overestimate policy values for out-of-distribution actions. Nevertheless, WPE preserves the relative rankings of different policies. In emulating real robot executions, WPE achieves high fidelity in mimicing robot arm movements as in real videos, while emulating highly realistic object interaction remains challenging. Despite this limitation, we show that a world model can serve as a starting point for evaluating robot policies before real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

Evaluating robot policies without costly real-world testing
Improving simulation accuracy to reflect real-world conditions
Using world models for reliable policy ranking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-conditioned video generation model
Blockwise-Autoregressive Diffusion Transformer
Vision-language model as reward function
🔎 Similar Papers
No similar papers found.