PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D generation methods often neglect physical plausibility and joint articulation, limiting their applicability in embodied intelligence and interactive simulation. This paper introduces the first vision-language model (VLM)-driven 3D generation framework explicitly designed for physics-based simulation. Given a single real-world image, it reconstructs simulation-ready 3D assets with explicit geometric structure, articulated joints, and physical properties—including mass, friction, and inertia. Key contributions include: (1) an efficient 3D geometric tokenization scheme reducing token consumption by 193×; (2) PhysX-Mobility, a large-scale dataset with physics-aware motion annotations, expanding object categories by over 2×; and (3) end-to-end training and validation within a MuJoCo-style differentiable physics simulator. Experiments demonstrate high-fidelity, physically consistent 3D reconstructions from diverse real images, significantly improving robot policy learning performance in contact-rich scenarios and enhancing cross-category generalization.

Technology Category

Application Category

📝 Abstract
3D modeling is shifting from static visual representations toward physical, articulated assets that can be directly used in simulation and interaction. However, most existing 3D generation methods overlook key physical and articulation properties, thereby limiting their utility in embodied AI. To bridge this gap, we introduce PhysX-Anything, the first simulation-ready physical 3D generative framework that, given a single in-the-wild image, produces high-quality sim-ready 3D assets with explicit geometry, articulation, and physical attributes. Specifically, we propose the first VLM-based physical 3D generative model, along with a new 3D representation that efficiently tokenizes geometry. It reduces the number of tokens by 193x, enabling explicit geometry learning within standard VLM token budgets without introducing any special tokens during fine-tuning and significantly improving generative quality. In addition, to overcome the limited diversity of existing physical 3D datasets, we construct a new dataset, PhysX-Mobility, which expands the object categories in prior physical 3D datasets by over 2x and includes more than 2K common real-world objects with rich physical annotations. Extensive experiments on PhysX-Mobility and in-the-wild images demonstrate that PhysX-Anything delivers strong generative performance and robust generalization. Furthermore, simulation-based experiments in a MuJoCo-style environment validate that our sim-ready assets can be directly used for contact-rich robotic policy learning. We believe PhysX-Anything can substantially empower a broad range of downstream applications, especially in embodied AI and physics-based simulation.
Problem

Research questions and friction points this paper is trying to address.

Generating simulation-ready 3D assets from single images with physical properties
Overcoming limitations of existing 3D methods lacking articulation and physics attributes
Addressing limited diversity in physical 3D datasets for embodied AI applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-based physical 3D generative model
Token-efficient 3D representation reducing tokens 193x
New PhysX-Mobility dataset with expanded object categories
🔎 Similar Papers
No similar papers found.