MD-Face: MoE-Enhanced Label-Free Disentangled Representation for Interactive Facial Attribute Editing

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work addresses the challenges of attribute entanglement and high annotation costs in unsupervised facial attribute editing by proposing the first mixture-of-experts (MoE)-based disentangled representation learning framework. The method employs a dynamic gating mechanism to assign specialized experts, enabling the learning of semantically independent latent vectors. To enhance editability, it introduces a geometry-aware loss to align semantic boundary vectors (SBVs) and incorporates a Jacobian push-forward technique to improve control precision. Evaluated on ProGAN and StyleGAN, the approach significantly outperforms existing unsupervised baselines and achieves performance comparable to supervised methods. Moreover, compared to diffusion-based models, it delivers higher image fidelity with substantially lower inference latency, making it suitable for high-quality, real-time interactive editing.

Technology Category

Application Category

📝 Abstract

GAN-based facial attribute editing is widely used in virtual avatars and social media but often suffers from attribute entanglement, where modifying one face attribute unintentionally alters others. While supervised disentangled representation learning can address this, it relies heavily on labeled data, incurring high annotation costs. To address these challenges, we propose MD-Face, a label-free disentangled representation learning framework based on Mixture of Experts (MoE). MD-Face utilizes a MoE backbone with a gating mechanism that dynamically allocates experts, enabling the model to learn semantic vectors with greater independence. To further enhance attribute entanglement, we introduce a geometry-aware loss, which aligns each semantic vector with its corresponding Semantic Boundary Vector (SBV) through a Jacobian-based pushforward method. Experiments with ProGAN and StyleGAN show that MD-Face outperforms unsupervised baselines and competes with supervised ones. Compared to diffusion-based methods, it offers better image quality and lower inference latency, making it ideal for interactive editing.

Problem

Research questions and friction points this paper is trying to address.

attribute entanglement

disentangled representation

label-free learning

facial attribute editing

annotation cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts

disentangled representation

label-free learning