DreamArt: Generating Interactable Articulated Objects from a Single Image

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Generating interactive articulated 3D assets (e.g., laptops, microwaves) from a single input image remains challenging due to the difficulty of jointly achieving accurate part segmentation, physically plausible joint modeling, and scalability to diverse object categories. Method: We propose the first end-to-end, single-image-driven framework for generating interactive articulated assets. It comprises three stages: (1) mask-guided robust part segmentation; (2) occlusion-robust motion prior learning via part-level non-occluded geometry completion and fine-tuned video diffusion models; and (3) high-fidelity 3D reconstruction using dual-quaternion motion optimization and global texture inpainting. Results: Our method significantly outperforms state-of-the-art approaches in geometric accuracy, visual realism, and kinematic plausibility. It enables plug-and-play interaction in AR/VR and embodied AI applications, establishing a new paradigm for single-image-driven intelligent object modeling.

Technology Category

Application Category

📝 Abstract

Generating articulated objects, such as laptops and microwaves, is a crucial yet challenging task with extensive applications in Embodied AI and AR/VR. Current image-to-3D methods primarily focus on surface geometry and texture, neglecting part decomposition and articulation modeling. Meanwhile, neural reconstruction approaches (e.g., NeRF or Gaussian Splatting) rely on dense multi-view or interaction data, limiting their scalability. In this paper, we introduce DreamArt, a novel framework for generating high-fidelity, interactable articulated assets from single-view images. DreamArt employs a three-stage pipeline: firstly, it reconstructs part-segmented and complete 3D object meshes through a combination of image-to-3D generation, mask-prompted 3D segmentation, and part amodal completion. Second, we fine-tune a video diffusion model to capture part-level articulation priors, leveraging movable part masks as prompt and amodal images to mitigate ambiguities caused by occlusion. Finally, DreamArt optimizes the articulation motion, represented by a dual quaternion, and conducts global texture refinement and repainting to ensure coherent, high-quality textures across all parts. Experimental results demonstrate that DreamArt effectively generates high-quality articulated objects, possessing accurate part shape, high appearance fidelity, and plausible articulation, thereby providing a scalable solution for articulated asset generation. Our project page is available at https://dream-art-0.github.io/DreamArt/.

Problem

Research questions and friction points this paper is trying to address.

Generating articulated objects from single images

Overcoming limitations in part decomposition and articulation modeling

Providing scalable solution for high-fidelity interactable assets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs part-segmented 3D meshes from single images

Fine-tunes video diffusion for part articulation priors

Optimizes dual quaternion articulation and texture refinement

🔎 Similar Papers

No similar papers found.