VAE-Inf: A statistically interpretable generative paradigm for imbalanced classification

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the instability of decision boundaries and unreliable error control in classification under extreme class imbalance by proposing VAE-Inf, a two-stage framework. First, a variational autoencoder is trained exclusively on the majority class to construct a Gaussian reference model; then, the encoder is fine-tuned using a small number of minority-class samples, incorporating a distribution-aware loss to enhance class separation. The method innovatively integrates generative modeling with distribution-free statistical testing, employing a variance-normalized projected statistic to achieve precise Type-I error control under limited sample sizes while yielding a geometrically sound and interpretable decision mechanism. Experiments demonstrate that VAE-Inf attains state-of-the-art classification performance across multiple real-world imbalanced datasets while rigorously controlling the false positive rate.

📝 Abstract

Imbalanced classification remains a pervasive challenge in machine learning, particularly when minority samples are too scarce to provide a robust discriminative boundary. In such extreme scenarios, conventional models often suffer from unstable decision boundaries and a lack of reliable error control. To bridge the gap between generative modeling and discriminative classification, we propose a two-stage framework \textbf{VAE-Inf} that integrates deep representation learning with statistically interpretable hypothesis testing. In the first stage, we adopt a one-class modeling perspective by training a variational autoencoder (VAE) exclusively on majority-class data to capture the underlying reference distribution. The resulting latent posteriors are aggregated via a Wasserstein barycenter to construct a global Gaussian reference model, providing a geometrically principled baseline for the majority class. In the second stage, we transform this generative foundation into a discriminative classifier by fine-tuning the encoder with limited minority samples. This is achieved through a novel distribution-aware loss that enforces probabilistic separation between classes based on variance-normalized projection statistics. For inference, we introduce a projection-based score that admits a natural hypothesis testing interpretation, allowing for a distribution-free calibration procedure. This approach yields exact finite-sample control of the Type-I error (false positive rate) without relying on restrictive parametric assumptions. Extensive experiments on diverse real-world benchmarks demonstrate that our framework achieves competitive performance against other approaches. The codes are available upon request.

Problem

Research questions and friction points this paper is trying to address.

imbalanced classification

minority samples

decision boundary

Type-I error

statistical interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

VAE-Inf

imbalanced classification

statistically interpretable