Actionable World Representation

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

246K/year
🤖 AI Summary
This work addresses the lack of a unified, explicit representation of manipulable object states in existing physical world models, which hinders the development of action-aware general-purpose representations. To overcome this limitation, the authors propose WorldString, a novel neural architecture that, for the first time, learns a unified and differentiable manifold of object affordance states directly from point clouds or RGB-D videos, thereby constructing a differentiable digital twin representation. By embedding manipulability directly into the world representation, WorldString establishes a general and differentiable foundational module for physical objects that integrates seamlessly with policy learning and neural dynamics, demonstrating strong compatibility and promising applicability across various downstream tasks.
📝 Abstract
Inspired by the emergent behaviors in large language models that generalized human intelligence, the research community is pursuing similar emergent capabilities within world models, with a emphasis on modeling the physical world. Within the scope of physical world model, objects are the fundamental primitives that constitute physical reality. From humans to computers, nearly everything we interact with is an object. These objects are rarely static; they are actionable entities with varying states determined by their intrinsic properties. While current methods approach object action states either via video generation or dynamic scene reconstruction, none explicitly model this basic element in a unified, principled way to build an actionable object representation. We propose WorldString, a neural architecture capable of modeling the state manifold of real-world objects by learning directly from point clouds or RGB-D video streams. Serving as a versatile digital twin, it acts as a foundational building block for physical world models; thus, we name it WorldString. Sweetly, its fully differentiable structure seamlessly enables future integration with policy learning and neural dynamics.
Problem

Research questions and friction points this paper is trying to address.

actionable object representation
world models
physical world modeling
object state manifold
digital twin
Innovation

Methods, ideas, or system contributions that make the work stand out.

actionable object representation
world model
state manifold
differentiable architecture
digital twin