π€ AI Summary
This work addresses the challenge of single-image 3D generation, where achieving part-level structural diversity, multi-view consistency, and precise local editing remains difficult. To this end, we propose PartRAG, a novel framework that introduces a hierarchical contrastive retrieval mechanism to align image patches with latent representations from an external 3D part database, constructing editable part-level representations within a shared canonical space. By integrating a diffusion Transformer with a mask-based part editor, our method enables efficient interactive editing without requiring full regeneration. Experiments on Objaverse demonstrate that PartRAG significantly reduces Chamfer Distance (from 0.1726 to 0.1528), improves F-Score (from 0.7472 to 0.844), and produces sharper part boundaries and higher-fidelity details, with inference taking only 38 seconds and interactive edits completed in 5β8 seconds.
π Abstract
Single-image 3D generation with part-level structure remains challenging: learned priors struggle to cover the long tail of part geometries and maintain multi-view consistency, and existing systems provide limited support for precise, localized edits. We present PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation. To overcome the first challenge, we introduce a Hierarchical Contrastive Retrieval module that aligns dense image patches with 3D part latents at both part and object granularity, retrieving from a curated bank of 1,236 part-annotated assets to inject diverse, physically plausible exemplars into denoising. To overcome the second challenge, we add a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency. PartRAG achieves competitive results on Objaverse, ShapeNet, and ABO-reducing Chamfer Distance from 0.1726 to 0.1528 and raising F-Score from 0.7472 to 0.844 on Objaverse-with inference of 38s and interactive edits in 5-8s. Qualitatively, PartRAG produces sharper part boundaries, better thin-structure fidelity, and robust behavior on articulated objects. Code: https://github.com/AIGeeksGroup/PartRAG. Website: https://aigeeksgroup.github.io/PartRAG.