Closed-Loop Unsupervised Representation Disentanglement with β-VAE Distillation and Diffusion Probabilistic Feedback

📅 2024-02-04
🏛️ European Conference on Computer Vision
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
Current unsupervised representation disentanglement faces three key bottlenecks: (1) reliance on synthetic or annotated data, limiting generalization to real-world scenarios; (2) hand-crafted constraints that hinder adaptive optimization; and (3) absence of evaluation metrics suitable for unlabeled real-world data. This paper proposes the first closed-loop disentanglement framework, integrating a diffusion autoencoder with β-VAE to achieve adaptive, interpretable semantic factor disentanglement via latent distillation and diffusion-based feedback. Our contributions include: (1) introducing the first closed-loop learning paradigm for disentanglement; (2) proposing a novel, content-tracking-based evaluation metric for unsupervised disentanglement on unlabeled data; and (3) designing a self-supervised navigation strategy to identify interpretable semantic directions in latent space. Extensive experiments on real-image editing and visual analysis tasks demonstrate significant improvements over state-of-the-art methods, validating both the generalizability and practical utility of unsupervised disentanglement in natural scenes.

Technology Category

Application Category

📝 Abstract
Representation disentanglement may help AI fundamentally understand the real world and thus benefit both discrimination and generation tasks. It currently has at least three unresolved core issues: (i) heavy reliance on label annotation and synthetic data -- causing poor generalization on natural scenarios; (ii) heuristic/hand-craft disentangling constraints make it hard to adaptively achieve an optimal training trade-off; (iii) lacking reasonable evaluation metric, especially for the real label-free data. To address these challenges, we propose a extbf{C}losed- extbf{L}oop unsupervised representation extbf{Dis}entanglement approach dubbed extbf{CL-Dis}. Specifically, we use diffusion-based autoencoder (Diff-AE) as a backbone while resorting to $eta$-VAE as a co-pilot to extract semantically disentangled representations. The strong generation ability of diffusion model and the good disentanglement ability of VAE model are complementary. To strengthen disentangling, VAE-latent distillation and diffusion-wise feedback are interconnected in a closed-loop system for a further mutual promotion. Then, a self-supervised extbf{Navigation} strategy is introduced to identify interpretable semantic directions in the disentangled latent space. Finally, a new metric based on content tracking is designed to evaluate the disentanglement effect. Experiments demonstrate the superiority of CL-Dis on applications like real image manipulation and visual analysis.
Problem

Research questions and friction points this paper is trying to address.

Addresses poor generalization in representation disentanglement on natural data
Solves heuristic constraints preventing optimal training trade-off adaptation
Develops evaluation metric for unsupervised disentanglement without labels
Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-loop system with VAE distillation and diffusion feedback
Self-supervised navigation for semantic direction identification
Content-tracking metric for disentanglement evaluation
🔎 Similar Papers
No similar papers found.
X
Xin Jin
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
B
Bohan Li
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China; Shanghai Jiao Tong University, Shanghai, China
B
BAAO Xie
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China
Wenyao Zhang
Wenyao Zhang
PhD Student, Shanghai Jiaotong University
Robot Learning, Representation Learning
Jinming Liu
Jinming Liu
Shanghai Jiao Tong Univeristy
VLMLLMComputer VisionImage/Video Compression
Ziqiang Li
Ziqiang Li
Associate Professor, Nanjing University of Information Sciences and Technology
AIGCBackdoor LearningAI Security
T
Tao Yang
Xi’an Jiaotong University, Xi’an, China
W
Wenjun Zeng
Ningbo Institute of Digital Twin, Eastern Institute of Technology, Ningbo, China