AdaPower: Specializing World Foundation Models for Predictive Manipulation

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Despite their strong visual dynamics modeling capabilities, World Foundation Models (WFMs) suffer from a gap between photorealistic generation fidelity and precise control accuracy, limiting their applicability in exact robotic manipulation. This paper proposes a lightweight adaptation framework that transforms WFMs into prediction-oriented, task-specific manipulation models, integrated with Model Predictive Control (MPC) for efficient policy guidance. We introduce two key innovations: spatiotemporal test-time training and memory persistence—enabling adaptive inference-time adjustment and long-horizon consistency maintenance without policy retraining. Evaluated on the LIBERO benchmark, our method achieves over a 41% improvement in task success rate, while preserving computational efficiency, cross-task generalizability, and high-fidelity action precision. This effectively bridges the longstanding gap between generative realism and actionable control accuracy in vision-based robotic learning.

Technology Category

Application Category

📝 Abstract
World Foundation Models (WFMs) offer remarkable visual dynamics simulation capabilities, yet their application to precise robotic control remains limited by the gap between generative realism and control-oriented precision. While existing approaches use WFMs as synthetic data generators, they suffer from high computational costs and underutilization of pre-trained VLA policies. We introduce extbf{AdaPower} ( extbf{Ada}pt and Em extbf{power}), a lightweight adaptation framework that transforms general-purpose WFMs into specialist world models through two novel components: Temporal-Spatial Test-Time Training (TS-TTT) for inference-time adaptation and Memory Persistence (MP) for long-horizon consistency. Integrated within a Model Predictive Control framework, our adapted world model empowers pre-trained VLAs, achieving over 41% improvement in task success rates on LIBERO benchmarks without policy retraining, while preserving computational efficiency and generalist capabilities.
Problem

Research questions and friction points this paper is trying to address.

Adapts general world models for precise robotic control
Reduces computational costs in using foundation models
Enhances pre-trained VLA policies without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-Spatial Test-Time Training for inference-time adaptation
Memory Persistence for long-horizon consistency
Lightweight adaptation framework for Model Predictive Control integration
🔎 Similar Papers
No similar papers found.
Yuhang Huang
Yuhang Huang
National University of Defense Technology
Deep LearningComputer Vision
S
Shilong Zou
National University of Defense Technology
Jiazhao Zhang
Jiazhao Zhang
Peking University
Embodied AINavigation3D Vision
X
Xinwang Liu
National University of Defense Technology
R
Ruizhen Hu
Shenzhen University
K
Kai Xu
National University of Defense Technology