RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Virtual furniture synthesis suffers from the dual challenges of lacking standardized benchmarks and difficulty in simultaneously preserving background integrity and fidelity. To address this, we propose the first parameter-sharing dual-diffusion backbone architecture specifically designed for this task, unifying feature extraction and inpainting for both reference furniture objects and real indoor backgrounds. We introduce RoomBench++, a large-scale, reproducible benchmark comprising 112K training pairs, integrating multi-source data from photorealistic rendering and video capture. Our method is compatible with U-Net and DiT backbones and enforces cross-modal feature alignment via parameter sharing. Extensive evaluation demonstrates state-of-the-art performance across quantitative metrics (e.g., FID, LPIPS), visual quality, and human preference studies. Moreover, our approach exhibits strong zero-shot generalization to unseen indoor layouts and generic scenes. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Virtual furniture synthesis, which seamlessly integrates reference objects into indoor scenes while maintaining geometric coherence and visual realism, holds substantial promise for home design and e-commerce applications. However, this field remains underexplored due to the scarcity of reproducible benchmarks and the limitations of existing image composition methods in achieving high-fidelity furniture synthesis while preserving background integrity. To overcome these challenges, we first present RoomBench++, a comprehensive and publicly available benchmark dataset tailored for this task. It consists of 112,851 training pairs and 1,832 testing pairs drawn from both real-world indoor videos and realistic home design renderings, thereby supporting robust training and evaluation under practical conditions. Then, we propose RoomEditor++, a versatile diffusion-based architecture featuring a parameter-sharing dual diffusion backbone, which is compatible with both U-Net and DiT architectures. This design unifies the feature extraction and inpainting processes for reference and background images. Our in-depth analysis reveals that the parameter-sharing mechanism enforces aligned feature representations, facilitating precise geometric transformations, texture preservation, and seamless integration. Extensive experiments validate that RoomEditor++ is superior over state-of-the-art approaches in terms of quantitative metrics, qualitative assessments, and human preference studies, while highlighting its strong generalization to unseen indoor scenes and general scenes without task-specific fine-tuning. The dataset and source code are available at url{https://github.com/stonecutter-21/roomeditor}.

Problem

Research questions and friction points this paper is trying to address.

Addresses high-fidelity furniture synthesis in indoor scenes

Overcomes lack of benchmarks for realistic furniture integration

Enhances geometric coherence and visual realism in synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-sharing dual diffusion backbone architecture

Unified feature extraction and inpainting for reference and background

Generalizes to unseen scenes without task-specific fine-tuning

🔎 Similar Papers

Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints