RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Virtual furniture synthesis suffers from the dual challenges of lacking standardized benchmarks and difficulty in simultaneously preserving background integrity and fidelity. To address this, we propose the first parameter-sharing dual-diffusion backbone architecture specifically designed for this task, unifying feature extraction and inpainting for both reference furniture objects and real indoor backgrounds. We introduce RoomBench++, a large-scale, reproducible benchmark comprising 112K training pairs, integrating multi-source data from photorealistic rendering and video capture. Our method is compatible with U-Net and DiT backbones and enforces cross-modal feature alignment via parameter sharing. Extensive evaluation demonstrates state-of-the-art performance across quantitative metrics (e.g., FID, LPIPS), visual quality, and human preference studies. Moreover, our approach exhibits strong zero-shot generalization to unseen indoor layouts and generic scenes. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Virtual furniture synthesis, which seamlessly integrates reference objects into indoor scenes while maintaining geometric coherence and visual realism, holds substantial promise for home design and e-commerce applications. However, this field remains underexplored due to the scarcity of reproducible benchmarks and the limitations of existing image composition methods in achieving high-fidelity furniture synthesis while preserving background integrity. To overcome these challenges, we first present RoomBench++, a comprehensive and publicly available benchmark dataset tailored for this task. It consists of 112,851 training pairs and 1,832 testing pairs drawn from both real-world indoor videos and realistic home design renderings, thereby supporting robust training and evaluation under practical conditions. Then, we propose RoomEditor++, a versatile diffusion-based architecture featuring a parameter-sharing dual diffusion backbone, which is compatible with both U-Net and DiT architectures. This design unifies the feature extraction and inpainting processes for reference and background images. Our in-depth analysis reveals that the parameter-sharing mechanism enforces aligned feature representations, facilitating precise geometric transformations, texture preservation, and seamless integration. Extensive experiments validate that RoomEditor++ is superior over state-of-the-art approaches in terms of quantitative metrics, qualitative assessments, and human preference studies, while highlighting its strong generalization to unseen indoor scenes and general scenes without task-specific fine-tuning. The dataset and source code are available at url{https://github.com/stonecutter-21/roomeditor}.
Problem

Research questions and friction points this paper is trying to address.

Addresses high-fidelity furniture synthesis in indoor scenes
Overcomes lack of benchmarks for realistic furniture integration
Enhances geometric coherence and visual realism in synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-sharing dual diffusion backbone architecture
Unified feature extraction and inpainting for reference and background
Generalizes to unseen scenes without task-specific fine-tuning
🔎 Similar Papers
No similar papers found.
Qilong Wang
Qilong Wang
Tianjin University
Deep LearningComputer Vision
X
Xiaofan Ming
School of Artificial Intelligence, Tianjin University, Tianjin 300350, China
Z
Zhenyi Lin
School of Artificial Intelligence, Tianjin University, Tianjin 300350, China
J
Jinwen Li
School of Artificial Intelligence, Tianjin University, Tianjin 300350, China
Dongwei Ren
Dongwei Ren
Tianjin University
Computer VisionDeep Learning
Wangmeng Zuo
Wangmeng Zuo
School of Computer Science and Technology, Harbin Institute of Technology
Computer VisionImage ProcessingGenerative AIDeep LearningBiometrics
Qinghua Hu
Qinghua Hu
Professor of Computer Science, Tianjin University
Machine learningData Mining