COSY: Compositional 3DGS Synthesis for Disentangled Human Head Editing

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS)-based generative models often suffer from entangled latent representations, leading to unintended alterations in identity or appearance when editing semantic facial attributes such as hair color or eyewear. To address this, this work proposes a novel 3DGS-GAN architecture that achieves fully disentangled generation of head components—including hair, skin, glasses, and torso—without requiring segmentation masks or geometric priors. The approach employs independent component generators coordinated through a context token mechanism to ensure consistency in shape and illumination, enabling precise semantic editing guided solely by sparse color cues. Compared to existing methods, the proposed framework significantly enhances disentanglement and editing accuracy while preserving high-fidelity rendering quality.
📝 Abstract
Recent 3D Gaussian Splatting (3DGS) GANs for human heads synthesize and render photorealistic 3D models in real-time and offer a vast variety in identity and appearance. However, controlling specific semantic attributes such as hair color or glasses remains challenging, as edits in the entangled latent space often induce unintended changes in identity or appearance. Although there are several methods that aim to disentangle the latent space post training by estimating directions that only modify certain features, these methods cannot guarantee complete disentanglement and often require pre-trained classifiers. In our approach, we propose a new generator architecture that synthesizes components, such as hair, skin, glasses, and torso, completely independently. This allows for changing the latent vector for one region while keeping the remaining parts fixed. Further, we achieve this separation using only sparse information such as the hair or skin color, eliminating the requirement of segmentation masks or geometric priors, often seen in prior work. To ensure matching shape and lighting conditions during editing, we allow minimal shared information via context tokens between the independent generators. These tokens even allow us to control the shape and light, without any prior annotation. Compared to existing works on GAN-based generation and editing, our method shows better disentanglement, more precise editing control, and competitive visual quality.
Problem

Research questions and friction points this paper is trying to address.

3D Gaussian Splatting
disentangled editing
human head synthesis
latent space entanglement
semantic attribute control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional Generation
Disentangled Editing
3D Gaussian Splatting
Context Tokens
Semantic Decomposition