AdaPower: Specializing World Foundation Models for Predictive Manipulation

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Despite their strong visual dynamics modeling capabilities, World Foundation Models (WFMs) suffer from a gap between photorealistic generation fidelity and precise control accuracy, limiting their applicability in exact robotic manipulation. This paper proposes a lightweight adaptation framework that transforms WFMs into prediction-oriented, task-specific manipulation models, integrated with Model Predictive Control (MPC) for efficient policy guidance. We introduce two key innovations: spatiotemporal test-time training and memory persistence—enabling adaptive inference-time adjustment and long-horizon consistency maintenance without policy retraining. Evaluated on the LIBERO benchmark, our method achieves over a 41% improvement in task success rate, while preserving computational efficiency, cross-task generalizability, and high-fidelity action precision. This effectively bridges the longstanding gap between generative realism and actionable control accuracy in vision-based robotic learning.

Technology Category

Application Category

📝 Abstract

World Foundation Models (WFMs) offer remarkable visual dynamics simulation capabilities, yet their application to precise robotic control remains limited by the gap between generative realism and control-oriented precision. While existing approaches use WFMs as synthetic data generators, they suffer from high computational costs and underutilization of pre-trained VLA policies. We introduce extbf{AdaPower} ( extbf{Ada}pt and Em extbf{power}), a lightweight adaptation framework that transforms general-purpose WFMs into specialist world models through two novel components: Temporal-Spatial Test-Time Training (TS-TTT) for inference-time adaptation and Memory Persistence (MP) for long-horizon consistency. Integrated within a Model Predictive Control framework, our adapted world model empowers pre-trained VLAs, achieving over 41% improvement in task success rates on LIBERO benchmarks without policy retraining, while preserving computational efficiency and generalist capabilities.

Problem

Research questions and friction points this paper is trying to address.

Adapts general world models for precise robotic control

Reduces computational costs in using foundation models

Enhances pre-trained VLA policies without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-Spatial Test-Time Training for inference-time adaptation

Memory Persistence for long-horizon consistency

Lightweight adaptation framework for Model Predictive Control integration

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey