Efficient Autoregressive Inference for Transformer Probabilistic Models

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the challenge of balancing autoregressive generation efficiency with flexible set-conditional modeling in Transformer-based probabilistic models for joint distribution forecasting, this paper proposes Causal Autoregressive Caching (CAC). CAC decouples context encoding from conditional set updates via a dynamic cache buffer, enabling single-shot conditional encoding, multi-step cache reuse, and batched autoregressive decoding—thereby unifying efficient joint log-likelihood evaluation and joint set-conditional/autoregressive training. Innovatively, it introduces cache-context attention and target-dependent dynamic modeling, achieving substantial inference speedups without compromising conditional modeling flexibility—a first in the literature. Experiments across synthetic functions, EEG time series, cognitive modeling tasks, and tabular data demonstrate that CAC matches state-of-the-art baseline accuracy while accelerating joint sampling by up to 20×.

Technology Category

Application Category

📝 Abstract

Transformer-based models for amortized probabilistic inference, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many real-world applications, from signal interpolation to multi-column tabular predictions, require coherent joint distributions that capture dependencies between predictions. While purely autoregressive architectures efficiently generate such distributions, they sacrifice the flexible set-conditioning that makes these models powerful for meta-learning. Conversely, the standard approach to obtain joint distributions from set-based models requires expensive re-encoding of the entire augmented conditioning set at each autoregressive step. We introduce a causal autoregressive buffer that preserves the advantages of both paradigms. Our approach decouples context encoding from updating the conditioning set. The model processes the context once and caches it. A dynamic buffer then captures target dependencies: as targets are incorporated, they enter the buffer and attend to both the cached context and previously buffered targets. This enables efficient batched autoregressive generation and one-pass joint log-likelihood evaluation. A unified training strategy allows seamless integration of set-based and autoregressive modes at minimal additional cost. Across synthetic functions, EEG signals, cognitive models, and tabular data, our method matches predictive accuracy of strong baselines while delivering up to 20 times faster joint sampling. Our approach combines the efficiency of autoregressive generative models with the representational power of set-based conditioning, making joint prediction practical for transformer-based probabilistic models.

Problem

Research questions and friction points this paper is trying to address.

Achieving efficient joint distribution sampling in transformer probabilistic models

Balancing autoregressive generation with flexible set-conditioning capabilities

Eliminating expensive re-encoding during autoregressive inference steps

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal autoregressive buffer decouples context encoding

Dynamic buffer captures target dependencies during generation

Unified training integrates set-based and autoregressive modes

🔎 Similar Papers

WAVE: Weighted Autoregressive Varing Gate for Time Series Forecasting