Diffusion-based G-buffer generation and rendering

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing text-to-image generation methods struggle to precisely control scene geometry and material properties. This work proposes G-Render, the first framework integrating diffusion models with G-buffer representations—comprising albedo, surface normals, depth, roughness, and metallic maps—to decouple structural generation from photorealistic rendering. A multi-task diffusion network jointly predicts full-channel G-buffers, which are then fed into a modular, differentiable neural renderer to synthesize high-fidelity images. This design enables fine-grained geometric and material-level editing, including cross-channel object compositing, localized lighting mask control, and high-fidelity fusion of virtual objects into real-world scenes. Extensive experiments on text-to-image synthesis and editing tasks demonstrate significant improvements in controllability and editing flexibility, achieving seamless embedding of synthetic objects into authentic environments.

Technology Category

Application Category

📝 Abstract

Despite recent advances in text-to-image generation, controlling geometric layout and material properties in synthesized scenes remains challenging. We present a novel pipeline that first produces a G-buffer (albedo, normals, depth, roughness, and metallic) from a text prompt and then renders a final image through a modular neural network. This intermediate representation enables fine-grained editing: users can copy and paste within specific G-buffer channels to insert or reposition objects, or apply masks to the irradiance channel to adjust lighting locally. As a result, real objects can be seamlessly integrated into virtual scenes, and virtual objects can be placed into real environments with high fidelity. By separating scene decomposition from image rendering, our method offers a practical balance between detailed post-generation control and efficient text-driven synthesis. We demonstrate its effectiveness on a variety of examples, showing that G-buffer editing significantly extends the flexibility of text-guided image generation.

Problem

Research questions and friction points this paper is trying to address.

Control geometric layout and material properties in synthesized scenes.

Enable fine-grained editing of G-buffer channels for object manipulation.

Integrate real objects into virtual scenes with high fidelity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates G-buffer from text prompts

Enables fine-grained editing via G-buffer channels

Separates scene decomposition from image rendering

🔎 Similar Papers

No similar papers found.