BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of controllable re-composition of objects, cameras, and backgrounds in complex 3D scenes. Methodologically, it introduces a three-stage “hierarchical disentanglement–3D editing–diffusion fusion” framework: (1) parsing input images into semantically aligned 3D entities; (2) performing geometry- and pose-consistent edits in Blender; and (3) fine-tuning a diffusion model with source-image mask supervision and synthetic jitter data augmentation to jointly model source and target views, ensuring multi-view consistency. Key contributions include: (1) the first generative synthesis paradigm explicitly guided by semantic 3D representations; (2) decoupled, independent control over objects, camera viewpoints, and background; and (3) state-of-the-art performance across diverse compositional editing tasks—achieving superior fidelity, geometric consistency, and precise controllability compared to existing methods.

Technology Category

Application Category

📝 Abstract
We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene editing tasks.
Problem

Research questions and friction points this paper is trying to address.

Synthesizes new scenes by recomposing objects, camera, and background
Edits visual inputs into 3D entities with grounded control
Fuses edited elements into coherent scenes using generative compositing
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D-grounded visual editing in Blender
Generative compositor with parallel scene processing
Source masking and object jittering training
🔎 Similar Papers
No similar papers found.