PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing

πŸ“… 2026-02-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of single-image 3D generation, where achieving part-level structural diversity, multi-view consistency, and precise local editing remains difficult. To this end, we propose PartRAG, a novel framework that introduces a hierarchical contrastive retrieval mechanism to align image patches with latent representations from an external 3D part database, constructing editable part-level representations within a shared canonical space. By integrating a diffusion Transformer with a mask-based part editor, our method enables efficient interactive editing without requiring full regeneration. Experiments on Objaverse demonstrate that PartRAG significantly reduces Chamfer Distance (from 0.1726 to 0.1528), improves F-Score (from 0.7472 to 0.844), and produces sharper part boundaries and higher-fidelity details, with inference taking only 38 seconds and interactive edits completed in 5–8 seconds.

Technology Category

Application Category

πŸ“ Abstract
Single-image 3D generation with part-level structure remains challenging: learned priors struggle to cover the long tail of part geometries and maintain multi-view consistency, and existing systems provide limited support for precise, localized edits. We present PartRAG, a retrieval-augmented framework that integrates an external part database with a diffusion transformer to couple generation with an editable representation. To overcome the first challenge, we introduce a Hierarchical Contrastive Retrieval module that aligns dense image patches with 3D part latents at both part and object granularity, retrieving from a curated bank of 1,236 part-annotated assets to inject diverse, physically plausible exemplars into denoising. To overcome the second challenge, we add a masked, part-level editor that operates in a shared canonical space, enabling swaps, attribute refinements, and compositional updates without regenerating the whole object while preserving non-target parts and multi-view consistency. PartRAG achieves competitive results on Objaverse, ShapeNet, and ABO-reducing Chamfer Distance from 0.1726 to 0.1528 and raising F-Score from 0.7472 to 0.844 on Objaverse-with inference of 38s and interactive edits in 5-8s. Qualitatively, PartRAG produces sharper part boundaries, better thin-structure fidelity, and robust behavior on articulated objects. Code: https://github.com/AIGeeksGroup/PartRAG. Website: https://aigeeksgroup.github.io/PartRAG.
Problem

Research questions and friction points this paper is trying to address.

part-level 3D generation
single-image 3D reconstruction
multi-view consistency
localized editing
long-tail part geometries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation
Part-Level 3D Editing
Hierarchical Contrastive Retrieval
Diffusion Transformer
Multi-View Consistency
πŸ”Ž Similar Papers
No similar papers found.
P
Peize Li
King’s College London
Z
Zeyu Zhang
Peking University
Hao Tang
Hao Tang
Peking University
computer vision