DreamArt: Generating Interactable Articulated Objects from a Single Image

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating interactive articulated 3D assets (e.g., laptops, microwaves) from a single input image remains challenging due to the difficulty of jointly achieving accurate part segmentation, physically plausible joint modeling, and scalability to diverse object categories. Method: We propose the first end-to-end, single-image-driven framework for generating interactive articulated assets. It comprises three stages: (1) mask-guided robust part segmentation; (2) occlusion-robust motion prior learning via part-level non-occluded geometry completion and fine-tuned video diffusion models; and (3) high-fidelity 3D reconstruction using dual-quaternion motion optimization and global texture inpainting. Results: Our method significantly outperforms state-of-the-art approaches in geometric accuracy, visual realism, and kinematic plausibility. It enables plug-and-play interaction in AR/VR and embodied AI applications, establishing a new paradigm for single-image-driven intelligent object modeling.

Technology Category

Application Category

📝 Abstract
Generating articulated objects, such as laptops and microwaves, is a crucial yet challenging task with extensive applications in Embodied AI and AR/VR. Current image-to-3D methods primarily focus on surface geometry and texture, neglecting part decomposition and articulation modeling. Meanwhile, neural reconstruction approaches (e.g., NeRF or Gaussian Splatting) rely on dense multi-view or interaction data, limiting their scalability. In this paper, we introduce DreamArt, a novel framework for generating high-fidelity, interactable articulated assets from single-view images. DreamArt employs a three-stage pipeline: firstly, it reconstructs part-segmented and complete 3D object meshes through a combination of image-to-3D generation, mask-prompted 3D segmentation, and part amodal completion. Second, we fine-tune a video diffusion model to capture part-level articulation priors, leveraging movable part masks as prompt and amodal images to mitigate ambiguities caused by occlusion. Finally, DreamArt optimizes the articulation motion, represented by a dual quaternion, and conducts global texture refinement and repainting to ensure coherent, high-quality textures across all parts. Experimental results demonstrate that DreamArt effectively generates high-quality articulated objects, possessing accurate part shape, high appearance fidelity, and plausible articulation, thereby providing a scalable solution for articulated asset generation. Our project page is available at https://dream-art-0.github.io/DreamArt/.
Problem

Research questions and friction points this paper is trying to address.

Generating articulated objects from single images
Overcoming limitations in part decomposition and articulation modeling
Providing scalable solution for high-fidelity interactable assets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs part-segmented 3D meshes from single images
Fine-tunes video diffusion for part articulation priors
Optimizes dual quaternion articulation and texture refinement
🔎 Similar Papers
No similar papers found.
Ruijie Lu
Ruijie Lu
Peking University
computer vision
Y
Yu Liu
Tsinghua University, China
Jiaxiang Tang
Jiaxiang Tang
NVIDIA; Peking University
Computer ScienceComputer Vision
Junfeng Ni
Junfeng Ni
Tsinghua University
Computer Vision3D Reconstruction
Y
Yuxiang Wang
State Key Lab of General AI, Peking University, China
Diwen Wan
Diwen Wan
AIRCAS,PKU
Computer Vision
Gang Zeng
Gang Zeng
Peking University
Computer VisionPattern RecognitionComputer Graphics
Y
Yixin Chen
State Key Lab of General AI, BIGAI, China
S
Siyuan Huang
State Key Lab of General AI, BIGAI, China