🤖 AI Summary
Generating interactive articulated 3D assets (e.g., laptops, microwaves) from a single input image remains challenging due to the difficulty of jointly achieving accurate part segmentation, physically plausible joint modeling, and scalability to diverse object categories.
Method: We propose the first end-to-end, single-image-driven framework for generating interactive articulated assets. It comprises three stages: (1) mask-guided robust part segmentation; (2) occlusion-robust motion prior learning via part-level non-occluded geometry completion and fine-tuned video diffusion models; and (3) high-fidelity 3D reconstruction using dual-quaternion motion optimization and global texture inpainting.
Results: Our method significantly outperforms state-of-the-art approaches in geometric accuracy, visual realism, and kinematic plausibility. It enables plug-and-play interaction in AR/VR and embodied AI applications, establishing a new paradigm for single-image-driven intelligent object modeling.
📝 Abstract
Generating articulated objects, such as laptops and microwaves, is a crucial yet challenging task with extensive applications in Embodied AI and AR/VR. Current image-to-3D methods primarily focus on surface geometry and texture, neglecting part decomposition and articulation modeling. Meanwhile, neural reconstruction approaches (e.g., NeRF or Gaussian Splatting) rely on dense multi-view or interaction data, limiting their scalability. In this paper, we introduce DreamArt, a novel framework for generating high-fidelity, interactable articulated assets from single-view images. DreamArt employs a three-stage pipeline: firstly, it reconstructs part-segmented and complete 3D object meshes through a combination of image-to-3D generation, mask-prompted 3D segmentation, and part amodal completion. Second, we fine-tune a video diffusion model to capture part-level articulation priors, leveraging movable part masks as prompt and amodal images to mitigate ambiguities caused by occlusion. Finally, DreamArt optimizes the articulation motion, represented by a dual quaternion, and conducts global texture refinement and repainting to ensure coherent, high-quality textures across all parts. Experimental results demonstrate that DreamArt effectively generates high-quality articulated objects, possessing accurate part shape, high appearance fidelity, and plausible articulation, thereby providing a scalable solution for articulated asset generation. Our project page is available at https://dream-art-0.github.io/DreamArt/.