L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the problem of manual hyperparameter tuning required to balance reconstruction loss and disentanglement loss in disentangled representation learning. Methodologically, it proposes an end-to-end learnable dynamic weighting mechanism within the β-VAE framework: introducing differentiable, trainable loss weights and incorporating a gradient-aware regularization term to mitigate optimization bias in weight learning, thereby enabling joint optimization of model parameters and loss weights. The key contribution lies in embedding hyperparameters—specifically, loss weights—into the differentiable training pipeline, thus simultaneously optimizing for both disentanglement quality and reconstruction fidelity. Experiments demonstrate state-of-the-art or competitive disentanglement performance on standard benchmarks (e.g., dSprites, MPI3D), as measured by metrics such as DCI and MIG. Moreover, the method achieves effective unsupervised disentanglement of facial attributes—including pose and expression—on CelebA, validating its generalizability to real-world image data.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a novel model called Learnable VAE (L-VAE), which learns a disentangled representation together with the hyperparameters of the cost function. L-VAE can be considered as an extension of {eta}-VAE, wherein the hyperparameter, {eta}, is empirically adjusted. L-VAE mitigates the limitations of {eta}-VAE by learning the relative weights of the terms in the loss function to control the dynamic trade-off between disentanglement and reconstruction losses. In the proposed model, the weight of the loss terms and the parameters of the model architecture are learned concurrently. An additional regularization term is added to the loss function to prevent bias towards either reconstruction or disentanglement losses. Experimental analyses show that the proposed L-VAE finds an effective balance between reconstruction fidelity and disentangling the latent dimensions. Comparisons of the proposed L-VAE against {eta}-VAE, VAE, ControlVAE, DynamicVAE, and σ-VAE on datasets, such as dSprites, MPI3D-complex, Falcor3D, and Isaac3D reveals that L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics. Moreover, qualitative experiments on CelebA dataset, confirm the success of the L-VAE model for disentangling the facial attributes.

Problem

Research questions and friction points this paper is trying to address.

Learn dynamic trade-off between disentanglement and reconstruction losses

Mitigate empirical adjustment limitations of beta-VAE hyperparameters

Achieve effective balance in latent dimension disentanglement and fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable beta for dynamic loss trade-off

Concurrent learning of weights and parameters

Regularization prevents bias in loss terms

🔎 Similar Papers

Causal Flow-based Variational Auto-Encoder for Disentangled Causal Representation Learning

2023-04-18Citations: 2

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)