BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This paper addresses the challenge of controllable re-composition of objects, cameras, and backgrounds in complex 3D scenes. Methodologically, it introduces a three-stage “hierarchical disentanglement–3D editing–diffusion fusion” framework: (1) parsing input images into semantically aligned 3D entities; (2) performing geometry- and pose-consistent edits in Blender; and (3) fine-tuning a diffusion model with source-image mask supervision and synthetic jitter data augmentation to jointly model source and target views, ensuring multi-view consistency. Key contributions include: (1) the first generative synthesis paradigm explicitly guided by semantic 3D representations; (2) decoupled, independent control over objects, camera viewpoints, and background; and (3) state-of-the-art performance across diverse compositional editing tasks—achieving superior fidelity, geometric consistency, and precise controllability compared to existing methods.

Technology Category

Application Category

📝 Abstract

We present BlenderFusion, a generative visual compositing framework that synthesizes new scenes by recomposing objects, camera, and background. It follows a layering-editing-compositing pipeline: (i) segmenting and converting visual inputs into editable 3D entities (layering), (ii) editing them in Blender with 3D-grounded control (editing), and (iii) fusing them into a coherent scene using a generative compositor (compositing). Our generative compositor extends a pre-trained diffusion model to process both the original (source) and edited (target) scenes in parallel. It is fine-tuned on video frames with two key training strategies: (i) source masking, enabling flexible modifications like background replacement; (ii) simulated object jittering, facilitating disentangled control over objects and camera. BlenderFusion significantly outperforms prior methods in complex compositional scene editing tasks.

Problem

Research questions and friction points this paper is trying to address.

Synthesizes new scenes by recomposing objects, camera, and background

Edits visual inputs into 3D entities with grounded control

Fuses edited elements into coherent scenes using generative compositing

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D-grounded visual editing in Blender

Generative compositor with parallel scene processing

Source masking and object jittering training

🔎 Similar Papers

Streamlining Image Editing with Layered Diffusion Brushes