Your VAR Model is Secretly an Efficient and Explainable Generative Classifier

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing generative classifiers predominantly rely on computationally expensive and inherently opaque diffusion models, hindering scalability and interpretability. To address this, we propose A-VARC+, the first approach to leverage vision autoregressive models (VAR) for generative classification. Our method introduces a conditional generative classification framework, an adaptive classification head, and a token-level mutual information-based mechanism for post-hoc interpretability analysis. A-VARC+ achieves a superior trade-off between accuracy and inference speed—significantly outperforming diffusion-based counterparts in latency—while attaining state-of-the-art classification performance on standard benchmarks. Moreover, it demonstrates robust resistance to catastrophic forgetting, exhibiting strong generalization and intrinsic interpretability in class-incremental learning settings.

Technology Category

Application Category

📝 Abstract

Generative classifiers, which leverage conditional generative models for classification, have recently demonstrated desirable properties such as robustness to distribution shifts. However, recent progress in this area has been largely driven by diffusion-based models, whose substantial computational cost severely limits scalability. This exclusive focus on diffusion-based methods has also constrained our understanding of generative classifiers. In this work, we propose a novel generative classifier built on recent advances in visual autoregressive (VAR) modeling, which offers a new perspective for studying generative classifiers. To further enhance its performance, we introduce the Adaptive VAR Classifier$^+$ (A-VARC$^+$), which achieves a superior trade-off between accuracy and inference speed, thereby significantly improving practical applicability. Moreover, we show that the VAR-based method exhibits fundamentally different properties from diffusion-based methods. In particular, due to its tractable likelihood, the VAR-based classifier enables visual explainability via token-wise mutual information and demonstrates inherent resistance to catastrophic forgetting in class-incremental learning tasks.

Problem

Research questions and friction points this paper is trying to address.

Developing efficient generative classifiers beyond diffusion models

Improving accuracy-speed tradeoff in visual autoregressive classification

Enabling explainability and catastrophic forgetting resistance in classifiers

Innovation

Methods, ideas, or system contributions that make the work stand out.

VAR-based generative classifier for classification

Adaptive VAR Classifier+ balances accuracy and speed

VAR method enables explainability and resists forgetting

🔎 Similar Papers

LatentExplainer: Explaining Latent Representations in Deep Generative Models with Multimodal Large Language Models