L-VAE: Variational Auto-Encoder with Learnable Beta for Disentangled Representation

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of manual hyperparameter tuning required to balance reconstruction loss and disentanglement loss in disentangled representation learning. Methodologically, it proposes an end-to-end learnable dynamic weighting mechanism within the β-VAE framework: introducing differentiable, trainable loss weights and incorporating a gradient-aware regularization term to mitigate optimization bias in weight learning, thereby enabling joint optimization of model parameters and loss weights. The key contribution lies in embedding hyperparameters—specifically, loss weights—into the differentiable training pipeline, thus simultaneously optimizing for both disentanglement quality and reconstruction fidelity. Experiments demonstrate state-of-the-art or competitive disentanglement performance on standard benchmarks (e.g., dSprites, MPI3D), as measured by metrics such as DCI and MIG. Moreover, the method achieves effective unsupervised disentanglement of facial attributes—including pose and expression—on CelebA, validating its generalizability to real-world image data.

Technology Category

Application Category

📝 Abstract
In this paper, we propose a novel model called Learnable VAE (L-VAE), which learns a disentangled representation together with the hyperparameters of the cost function. L-VAE can be considered as an extension of {eta}-VAE, wherein the hyperparameter, {eta}, is empirically adjusted. L-VAE mitigates the limitations of {eta}-VAE by learning the relative weights of the terms in the loss function to control the dynamic trade-off between disentanglement and reconstruction losses. In the proposed model, the weight of the loss terms and the parameters of the model architecture are learned concurrently. An additional regularization term is added to the loss function to prevent bias towards either reconstruction or disentanglement losses. Experimental analyses show that the proposed L-VAE finds an effective balance between reconstruction fidelity and disentangling the latent dimensions. Comparisons of the proposed L-VAE against {eta}-VAE, VAE, ControlVAE, DynamicVAE, and σ-VAE on datasets, such as dSprites, MPI3D-complex, Falcor3D, and Isaac3D reveals that L-VAE consistently provides the best or the second best performances measured by a set of disentanglement metrics. Moreover, qualitative experiments on CelebA dataset, confirm the success of the L-VAE model for disentangling the facial attributes.
Problem

Research questions and friction points this paper is trying to address.

Learn dynamic trade-off between disentanglement and reconstruction losses
Mitigate empirical adjustment limitations of beta-VAE hyperparameters
Achieve effective balance in latent dimension disentanglement and fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable beta for dynamic loss trade-off
Concurrent learning of weights and parameters
Regularization prevents bias in loss terms
🔎 Similar Papers
No similar papers found.
H
Hazal Mogultay Ozcan
Computer Engineering, Middle East Technical University, Ankara, 06800, Turkey.
Sinan Kalkan
Sinan Kalkan
Dept. of Computer Eng., Middle East Technical University
Computer VisionDeep LearningRobotics
F
Fatos T. Yarman-Vural
Computer Engineering, Middle East Technical University, Ankara, 06800, Turkey.