Building Explicit World Model for Zero-Shot Open-World Object Manipulation

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

261K/year

🤖 AI Summary

Open-world object manipulation typically relies on extensive task-specific robot demonstrations, hindering zero-shot generalization. This work proposes the first framework that integrates an explicit world model with a physically plausible digital twin to achieve cross-object and cross-task zero-shot manipulation without any task demonstrations. By combining open-set perception, simulation-driven policy sampling and evaluation, and a unified Vision-Language-Action architecture, the approach eliminates dependence on large-scale action datasets. The method demonstrates strong generalization capabilities across multiple open-set manipulation tasks, enabling robots to reason about and interact with previously unseen objects and tasks in diverse environments.

Technology Category

Application Category

📝 Abstract

Open-world object manipulation remains a fundamental challenge in robotics. While Vision-Language-Action (VLA) models have demonstrated promising results, they rely heavily on large-scale robot action demonstrations, which are costly to collect and can hinder out-of-distribution generalization. In this paper, we propose an explicit-world-model-based framework for open-world manipulation that achieves zero-shot generalization by constructing a physically grounded digital twin of the environment. The framework integrates open-set perception, digital-twin reconstruction, sampling and evaluation of interaction strategies. By constructing a digital twin of the environment, our approach efficiently explores and evaluates manipulation strategies in physic-enabled simulator and reliably deploys the chosen strategy to the real world. Experimentally, the proposed framework is able to perform multiple open-set manipulation tasks without any task-specific action demonstrations, proving strong zero-shot generalization on both the task and object levels. Project Page: https://bojack-bj.github.io/projects/thesis/

Problem

Research questions and friction points this paper is trying to address.

open-world manipulation

zero-shot generalization

robotics

action demonstrations

out-of-distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

explicit world model

digital twin

zero-shot generalization