🤖 AI Summary
3D scene generation suffers from limited diversity, low visual fidelity, and poor view consistency—hindering its deployment in immersive media, robotics, autonomous driving, and embodied AI. This paper presents a systematic survey of four dominant paradigms: procedural, neural 3D, image-driven, and video-driven generation. We introduce the first unified taxonomy to clarify technical evolution across these approaches. Three emerging frontiers are identified: physics-aware modeling, interactive generation, and perception-generation co-design. Leveraging NeRF, 3D Gaussian Splatting, diffusion models, GANs, and multimodal representations, we conduct a rigorous cross-paradigm evaluation on standard benchmarks, quantifying trade-offs among fidelity, diversity, and view consistency. To foster reproducibility and community advancement, we publicly release an open-source tracking platform that continuously monitors state-of-the-art progress.
📝 Abstract
3D scene generation seeks to synthesize spatially structured, semantically meaningful, and photorealistic environments for applications such as immersive media, robotics, autonomous driving, and embodied AI. Early methods based on procedural rules offered scalability but limited diversity. Recent advances in deep generative models (e.g., GANs, diffusion models) and 3D representations (e.g., NeRF, 3D Gaussians) have enabled the learning of real-world scene distributions, improving fidelity, diversity, and view consistency. Recent advances like diffusion models bridge 3D scene synthesis and photorealism by reframing generation as image or video synthesis problems. This survey provides a systematic overview of state-of-the-art approaches, organizing them into four paradigms: procedural generation, neural 3D-based generation, image-based generation, and video-based generation. We analyze their technical foundations, trade-offs, and representative results, and review commonly used datasets, evaluation protocols, and downstream applications. We conclude by discussing key challenges in generation capacity, 3D representation, data and annotations, and evaluation, and outline promising directions including higher fidelity, physics-aware and interactive generation, and unified perception-generation models. This review organizes recent advances in 3D scene generation and highlights promising directions at the intersection of generative AI, 3D vision, and embodied intelligence. To track ongoing developments, we maintain an up-to-date project page: https://github.com/hzxie/Awesome-3D-Scene-Generation.