Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the mutual information collapse in β-VAE under strong regularization (β > 1), which leads to disentanglement failure and loss of semantic content. For the first time, the collapse mechanism is rigorously analyzed from an information-theoretic perspective. To mitigate this issue, the authors propose λβ-VAE, a novel dual-parameter regularization framework that decouples disentanglement regularization from information loss by introducing an L2 reconstruction penalty. Theoretical analysis, grounded in a linear Gaussian model and mutual information metrics, demonstrates that λβ-VAE effectively recovers meaningful latent semantics. Empirical validation on dSprites, Shapes3D, and MPI3D-real shows that the method substantially widens the effective range of β, stabilizes disentanglement performance, and enhances the semantic interpretability of latent representations.

Technology Category

Application Category

📝 Abstract

The $\beta$-VAE is a foundational framework for unsupervised disentanglement, using $\beta$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, disentanglement performance exhibits a pervasive non-monotonic trend: benchmarks such as MIG and SAP typically peak at intermediate $\beta$ and collapse as regularization increases. We demonstrate that this collapse is a fundamental information-theoretic failure, where strong Kullback-Leibler pressure promotes marginal independence at the expense of the latent channel's semantic informativeness. By formalizing this mechanism in a linear-Gaussian setting, we prove that for $\beta>1$, stationarity-induced dynamics trigger a spectral contraction of the encoder gain, driving latent-factor mutual information to zero. To resolve this, we introduce the $\lambda\beta$-VAE, which decouples regularization pressure from informational collapse via an auxiliary $L_2$ reconstruction penalty $\lambda$. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $\lambda>0$ stabilizes disentanglement and restores latent informativeness over a significantly broader range of $\beta$, providing a principled theoretical justification for dual-parameter regularization in variational inference backbones.

Problem

Research questions and friction points this paper is trying to address.

Mutual Information Collapse

Disentanglement Failure

β-VAE

Latent Informativeness

Non-monotonic Disentanglement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mutual Information Collapse

β-VAE

Disentanglement