CrowdGaussian: Reconstructing High-Fidelity 3D Gaussians for Human Crowd from a Single Image

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reconstructing high-quality 3D scenes of multiple people from a single image is challenging due to severe occlusions, low resolution, and diverse appearances. This work proposes CrowdGaussian, the first 3D Gaussian splatting framework tailored for crowded scenes. It initializes reconstruction using a pre-trained large-scale human body model and introduces a self-supervised adaptation mechanism coupled with a diffusion-model-based self-calibration learning strategy to jointly optimize geometry and appearance details. Notably, CrowdGaussian requires neither multi-view inputs nor manual annotations, yet it produces geometrically consistent and photorealistic 3D reconstructions of multiple individuals even under complex occlusion conditions, significantly outperforming existing methods.

Technology Category

Application Category

📝 Abstract
Single-view 3D human reconstruction has garnered significant attention in recent years. Despite numerous advancements, prior research has concentrated on reconstructing 3D models from clear, close-up images of individual subjects, often yielding subpar results in the more prevalent multi-person scenarios. Reconstructing 3D human crowd models is a highly intricate task, laden with challenges such as: 1) extensive occlusions, 2) low clarity, and 3) numerous and various appearances. To address this task, we propose CrowdGaussian, a unified framework that directly reconstructs multi-person 3D Gaussian Splatting (3DGS) representations from single-image inputs. To handle occlusions, we devise a self-supervised adaptation pipeline that enables the pretrained large human model to reconstruct complete 3D humans with plausible geometry and appearance from heavily occluded inputs. Furthermore, we introduce Self-Calibrated Learning (SCL). This training strategy enables single-step diffusion models to adaptively refine coarse renderings to optimal quality by blending identity-preserving samples with clean/corrupted image pairs. The outputs can be distilled back to enhance the quality of multi-person 3DGS representations. Extensive experiments demonstrate that CrowdGaussian generates photorealistic, geometrically coherent reconstructions of multi-person scenes.
Problem

Research questions and friction points this paper is trying to address.

3D human reconstruction
single-view
crowd
occlusion
multi-person
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting
single-image reconstruction
self-supervised adaptation
Self-Calibrated Learning
human crowd modeling