Closing the Train-Test Gap in World Models for Gradient-Based Planning

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental objective mismatch between offline world model training (next-state prediction) and gradient-based planning inference (action-sequence optimization). We propose a lightweight, architecture-agnostic data synthesis method applied during training to explicitly bridge the semantic gap between training objectives and test-time requirements. For the first time, we systematically identify and close the “training–test objective misalignment” gap by synthesizing trajectory data explicitly optimized for differentiable action optimization. Evaluated on multi-task object manipulation and navigation benchmarks, our approach enables gradient-based planners to match or surpass the performance of gradient-free cross-entropy method (CEM) while reducing inference latency to just 10% of CEM’s. The method significantly improves generalization, computational efficiency, and practical deployability—without modifying model architecture or increasing inference complexity.

Technology Category

Application Category

📝 Abstract
World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.
Problem

Research questions and friction points this paper is trying to address.

Improving gradient-based planning performance in world models
Closing the train-test gap in model predictive control
Enhancing offline-trained world models for efficient action sequences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Train-time data synthesis to close train-test gap
Improved gradient-based planning for world models
Outperforms classical methods in 10% time budget
🔎 Similar Papers
No similar papers found.
A
Arjun Parthasarathy
Columbia University
N
Nimit Kalra
Columbia University
R
Rohun Agrawal
Columbia University
Yann LeCun
Yann LeCun
Chief AI Scientist at Facebook & JT Schwarz Professor at the Courant Institute, New York University
AImachine learningcomputer visionroboticsimage compression
O
Oumayma Bounou
New York University
Pavel Izmailov
Pavel Izmailov
Anthropic; NYU
Machine LearningDeep LearningLanguage ModelsReasoningAI Alignment
M
Micah Goldblum
Columbia University