Efficient Autoregressive Inference for Transformer Probabilistic Models

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing autoregressive generation efficiency with flexible set-conditional modeling in Transformer-based probabilistic models for joint distribution forecasting, this paper proposes Causal Autoregressive Caching (CAC). CAC decouples context encoding from conditional set updates via a dynamic cache buffer, enabling single-shot conditional encoding, multi-step cache reuse, and batched autoregressive decoding—thereby unifying efficient joint log-likelihood evaluation and joint set-conditional/autoregressive training. Innovatively, it introduces cache-context attention and target-dependent dynamic modeling, achieving substantial inference speedups without compromising conditional modeling flexibility—a first in the literature. Experiments across synthetic functions, EEG time series, cognitive modeling tasks, and tabular data demonstrate that CAC matches state-of-the-art baseline accuracy while accelerating joint sampling by up to 20×.

Technology Category

Application Category

📝 Abstract
Transformer-based models for amortized probabilistic inference, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many real-world applications, from signal interpolation to multi-column tabular predictions, require coherent joint distributions that capture dependencies between predictions. While purely autoregressive architectures efficiently generate such distributions, they sacrifice the flexible set-conditioning that makes these models powerful for meta-learning. Conversely, the standard approach to obtain joint distributions from set-based models requires expensive re-encoding of the entire augmented conditioning set at each autoregressive step. We introduce a causal autoregressive buffer that preserves the advantages of both paradigms. Our approach decouples context encoding from updating the conditioning set. The model processes the context once and caches it. A dynamic buffer then captures target dependencies: as targets are incorporated, they enter the buffer and attend to both the cached context and previously buffered targets. This enables efficient batched autoregressive generation and one-pass joint log-likelihood evaluation. A unified training strategy allows seamless integration of set-based and autoregressive modes at minimal additional cost. Across synthetic functions, EEG signals, cognitive models, and tabular data, our method matches predictive accuracy of strong baselines while delivering up to 20 times faster joint sampling. Our approach combines the efficiency of autoregressive generative models with the representational power of set-based conditioning, making joint prediction practical for transformer-based probabilistic models.
Problem

Research questions and friction points this paper is trying to address.

Achieving efficient joint distribution sampling in transformer probabilistic models
Balancing autoregressive generation with flexible set-conditioning capabilities
Eliminating expensive re-encoding during autoregressive inference steps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal autoregressive buffer decouples context encoding
Dynamic buffer captures target dependencies during generation
Unified training integrates set-based and autoregressive modes
C
Conor Hassan
Department of Computer Science, Aalto University, Finland
Nasrulloh Loka
Nasrulloh Loka
University of Helsinki
Machine LearningBayesian OptimizationDeep Learning
C
Cen-You Li
Department of Computer Science, University of Helsinki, Finland
Daolang Huang
Daolang Huang
Aalto University
Machine LearningBayesian InferenceMeta Learning
P
Paul E. Chang
Department of Computer Science, University of Helsinki, Finland
Y
Yang Yang
Department of Computer Science, University of Helsinki, Finland
F
Francesco Silvestrin
Department of Computer Science, University of Helsinki, Finland
Samuel Kaski
Samuel Kaski
Director, ELLIS Institute Finland; Professor, Aalto University and University of Manchester
Probabilistic machine learningAI4ScienceCollaborative AI
Luigi Acerbi
Luigi Acerbi
Associate Professor of Machine and Human Intelligence, University of Helsinki
Machine LearningBayesian OptimizationComputational NeuroscienceProbabilistic Inference