🤖 AI Summary
Existing LoRA variants lack explicit mechanisms for modeling task-relevant information, limiting the representational capacity of low-rank subspaces. To address this, we propose FVAE-LoRA—the first integration of variational autoencoders (VAEs) into the low-rank adaptation framework. By designing a novel evidence lower bound (ELBO) objective, FVAE-LoRA enforces learning of two disentangled latent spaces: one dedicated to task-specific features and the other capturing residual structural information. This enables explicit separation of semantic signals from noise and bias within low-rank updates. Extensive experiments demonstrate that FVAE-LoRA consistently outperforms standard LoRA across diverse multimodal downstream tasks—including text, audio, and image domains—and exhibits superior generalization and robustness under distributional shift.
📝 Abstract
Low-rank adaptation (LoRA) is a widely used method for parameter-efficient finetuning. However, existing LoRA variants lack mechanisms to explicitly disambiguate task-relevant information within the learned low-rank subspace, potentially limiting downstream performance. We propose Factorized Variational Autoencoder LoRA (FVAE-LoRA), which leverages a VAE to learn two distinct latent spaces. Our novel Evidence Lower Bound formulation explicitly promotes factorization between the latent spaces, dedicating one latent space to task-salient features and the other to residual information. Extensive experiments on text, audio, and image tasks demonstrate that FVAE-LoRA consistently outperforms standard LoRA. Moreover, spurious correlation evaluations confirm that FVAE-LoRA better isolates task-relevant signals, leading to improved robustness under distribution shifts. Our code is publicly available at: https://github.com/idiap/FVAE-LoRA