SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address geometric distortion and inaccurate pose estimation caused by severe occlusion in open-set scenarios, this paper proposes a decoupled 3D scene generation framework that separates occlusion removal from 3D object generation to mitigate their mutual interference. We design a global-local attention mechanism integrating self-attention and cross-attention to enhance pose robustness, and introduce OpenScene3Dβ€”the first large-scale synthetic dataset tailored for open-set 3D scene composition. Furthermore, we propose a multi-scale unified pose estimation model. Our method is jointly trained on RGB images and dedicated deocclusion supervision signals, achieving significant improvements in geometric completeness and pose accuracy across diverse indoor and outdoor open-set scenes. It consistently outperforms state-of-the-art methods. The code and OpenScene3D dataset are publicly released.

Technology Category

Application Category

πŸ“ Abstract
We propose a decoupled 3D scene generation framework called SceneMaker in this work. Due to the lack of sufficient open-set de-occlusion and pose estimation priors, existing methods struggle to simultaneously produce high-quality geometry and accurate poses under severe occlusion and open-set settings. To address these issues, we first decouple the de-occlusion model from 3D object generation, and enhance it by leveraging image datasets and collected de-occlusion datasets for much more diverse open-set occlusion patterns. Then, we propose a unified pose estimation model that integrates global and local mechanisms for both self-attention and cross-attention to improve accuracy. Besides, we construct an open-set 3D scene dataset to further extend the generalization of the pose estimation model. Comprehensive experiments demonstrate the superiority of our decoupled framework on both indoor and open-set scenes. Our codes and datasets is released at https://idea-research.github.io/SceneMaker/.
Problem

Research questions and friction points this paper is trying to address.

Addresses open-set 3D scene generation with decoupled de-occlusion and pose estimation.
Improves geometry quality and pose accuracy under severe occlusion conditions.
Enhances generalization using diverse datasets and unified attention mechanisms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled de-occlusion model using diverse open-set occlusion patterns
Unified pose estimation model integrating global and local attention mechanisms
Open-set 3D scene dataset construction to enhance model generalization
πŸ”Ž Similar Papers
No similar papers found.
Y
Yukai Shi
Tsinghua University
Weiyu Li
Weiyu Li
The Hong Kong University of Science and Technology
Computer GraphicsNeural Rendering3D Content Generation
Z
Zihao Wang
LightIllusions
H
Hongyang Li
IDEA Research
X
Xingyu Chen
IDEA Research
Ping Tan
Ping Tan
Hong Kong University of Science and Technology (HKUST)
Computer VisionComputer Graphics
L
Lei Zhang
IDEA Research