OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation

📅 2024-12-15
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Current 3D generation and perception tasks are largely disjoint, lacking a unified modeling framework. To address this, we propose OccScene—the first unified framework enabling joint, mutually reinforcing learning for text-driven high-fidelity 3D scene generation and semantic occupancy prediction. Our approach introduces: (1) a cross-task bidirectional mutual learning paradigm bridging generation and perception; (2) a Mamba-based Dual Alignment module that aligns diffusion latent representations with fine-grained semantic occupancy features; and (3) synergistic optimization wherein semantic occupancy guides generation while perception priors refine the generative process—jointly enhancing both generation diversity and occupancy prediction accuracy. Extensive experiments across indoor and outdoor scenes demonstrate that OccScene simultaneously achieves state-of-the-art 3D fidelity and semantic occupancy performance, with mIoU significantly surpassing single-task baselines.

Technology Category

Application Category

📝 Abstract
Recent diffusion models have demonstrated remarkable performance in both 3D scene generation and perception tasks. Nevertheless, existing methods typically separate these two processes, acting as a data augmenter to generate synthetic data for downstream perception tasks. In this work, we propose OccScene, a novel mutual learning paradigm that integrates fine-grained 3D perception and high-quality generation in a unified framework, achieving a cross-task win-win effect. OccScene generates new and consistent 3D realistic scenes only depending on text prompts, guided with semantic occupancy in a joint-training diffusion framework. To align the occupancy with the diffusion latent, a Mamba-based Dual Alignment module is introduced to incorporate fine-grained semantics and geometry as perception priors. Within OccScene, the perception module can be effectively improved with customized and diverse generated scenes, while the perception priors in return enhance the generation performance for mutual benefits. Extensive experiments show that OccScene achieves realistic 3D scene generation in broad indoor and outdoor scenarios, while concurrently boosting the perception models to achieve substantial performance improvements in the 3D perception task of semantic occupancy prediction.
Problem

Research questions and friction points this paper is trying to address.

Integrates 3D perception and generation in unified framework
Generates realistic 3D scenes from text prompts
Improves semantic occupancy prediction through mutual learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic occupancy-guided joint-training diffusion framework
Mamba-based Dual Alignment module for latent alignment
Cross-task mutual learning between generation and perception
🔎 Similar Papers
No similar papers found.
B
Bohan Li
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China, and Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
X
Xin Jin
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
Jianan Wang
Jianan Wang
Astribot / IDEA / Deepmind / Oxford
Computer VisionGenerative AIRoboticsLearning Theory
Y
Yukai Shi
Astribot, Shenzhen, China
Y
Yasheng Sun
Astribot, Shenzhen, China
X
Xiaofeng Wang
Astribot, Shenzhen, China
Zhuang Ma
Zhuang Ma
The Wharton School, University of Pennsylvania
Machine LearningStatistics
B
Baao Xie
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
C
Chao Ma
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
X
Xiaokang Yang
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai, China
W
Wenjun Zeng
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China