Dissociating model architectures from inference computations

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This paper investigates the fundamental distinction between autoregressive models and deep temporal models in non-Markovian sequence modeling, focusing on decoupling model architecture—i.e., the factorization of the predictive distribution—from the actual computational path during inference. Method: We propose a structured context-access mechanism that, within the standard Transformer architecture, enables hierarchical temporal factorization via iterative inference, thereby decoupling the logical construction of predictions from network design. Crucially, this preserves the conventional next-token prediction training paradigm while explicitly separating modeling assumptions from inference scheduling. Contribution/Results: Our lightweight inference path maintains predictive performance while substantially reducing computational cost. Experiments provide the first systematic validation that architectural-inference decoupling is both effective and necessary for modeling non-Markovian dependencies, establishing a principled framework for temporally aware sequence modeling beyond strict autoregression.

Technology Category

Application Category

📝 Abstract

Parr et al., 2025 examines how auto-regressive and deep temporal models differ in their treatment of non-Markovian sequence modelling. Building on this, we highlight the need for dissociating model architectures, i.e., how the predictive distribution factorises, from the computations invoked at inference. We demonstrate that deep temporal computations are mimicked by autoregressive models by structuring context access during iterative inference. Using a transformer trained on next-token prediction, we show that inducing hierarchical temporal factorisation during iterative inference maintains predictive capacity while instantiating fewer computations. This emphasises that processes for constructing and refining predictions are not necessarily bound to their underlying model architectures.

Problem

Research questions and friction points this paper is trying to address.

Differentiating model architectures from inference computations

Comparing auto-regressive and deep temporal model performances

Maintaining predictive capacity with fewer computations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dissociates model architectures from inference computations

Mimics deep temporal with autoregressive models

Induces hierarchical temporal factorisation during inference

🔎 Similar Papers

No similar papers found.