Phase-Type Variational Autoencoders for Heavy-Tailed Data

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Standard variational autoencoders struggle to effectively model real-world heavy-tailed data due to their reliance on light-tailed decoders such as Gaussian distributions, while existing heavy-tailed extensions are constrained by predefined distribution families and lack flexibility. This work proposes the Phase-type Variational Autoencoder (PH-VAE), which introduces phase-type distributions into deep generative modeling for the first time. By constructing the decoder via the absorption time of a continuous-time Markov chain, PH-VAE adaptively shapes tail behavior in a data-driven manner through mixtures of multi-scale exponential distributions. In multivariate settings, it captures cross-dimensional tail dependencies through shared latent variables, offering both analytical tractability and strong expressive power. Experiments on synthetic and real-world datasets demonstrate that PH-VAE significantly outperforms Gaussian, Student’s t, and extreme value theory baselines, particularly in tail modeling and extreme quantile estimation.

Technology Category

Application Category

📝 Abstract

Heavy-tailed distributions are ubiquitous in real-world data, where rare but extreme events dominate risk and variability. However, standard Variational Autoencoders (VAEs) employ simple decoder distributions (e.g., Gaussian) that fail to capture heavy-tailed behavior, while existing heavy-tail-aware extensions remain restricted to predefined parametric families whose tail behavior is fixed a priori. We propose the Phase-Type Variational Autoencoder (PH-VAE), whose decoder distribution is a latent-conditioned Phase-Type (PH) distribution defined as the absorption time of a continuous-time Markov chain (CTMC). This formulation composes multiple exponential time scales, yielding a flexible and analytically tractable decoder that adapts its tail behavior directly from the observed data. Experiments on synthetic and real-world benchmarks demonstrate that PH-VAE accurately recovers diverse heavy-tailed distributions, significantly outperforming Gaussian, Student-t, and extreme-value-based VAE decoders in modeling tail behavior and extreme quantiles. In multivariate settings, PH-VAE captures realistic cross-dimensional tail dependence through its shared latent representation. To our knowledge, this is the first work to integrate Phase-Type distributions into deep generative modeling, bridging applied probability and representation learning.

Problem

Research questions and friction points this paper is trying to address.

heavy-tailed distributions

Variational Autoencoders

tail behavior

extreme events

decoder distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Phase-Type distribution

Variational Autoencoder

Heavy-tailed modeling