Inference Acceleration of Autoregressive Normalizing Flows by Selective Jacobi Decoding

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Autoregressive normalizing flows suffer from slow inference due to strict sequential dependencies that preclude parallelization. This paper proposes Selective Jacobian Decoding (SeJD), the first method to empirically identify and exploit a hierarchical redundancy pattern in autoregressive generation: lower-layer dependencies exhibit low redundancy, whereas higher-layer ones are highly redundant—enabling relaxation of full-sequence dependency constraints. SeJD integrates local Jacobian approximation, hierarchical dependency modeling, and parallel iterative optimization, yielding a decoding mechanism with theoretically guaranteed superlinear convergence and iteration count bounded by that of sequential sampling. Experiments across multiple datasets demonstrate up to 4.7× inference speedup without compromising generation quality or fidelity. The core contributions are: (i) uncovering the layered dependency redundancy property in autoregressive flows; and (ii) introducing the first parallel autoregressive flow decoding framework that jointly ensures rigorous convergence guarantees and practical acceleration.

Technology Category

Application Category

📝 Abstract

Normalizing flows are promising generative models with advantages such as theoretical rigor, analytical log-likelihood computation, and end-to-end training. However, the architectural constraints to ensure invertibility and tractable Jacobian computation limit their expressive power and practical usability. Recent advancements utilize autoregressive modeling, significantly enhancing expressive power and generation quality. However, such sequential modeling inherently restricts parallel computation during inference, leading to slow generation that impedes practical deployment. In this paper, we first identify that strict sequential dependency in inference is unnecessary to generate high-quality samples. We observe that patches in sequential modeling can also be approximated without strictly conditioning on all preceding patches. Moreover, the models tend to exhibit low dependency redundancy in the initial layer and higher redundancy in subsequent layers. Leveraging these observations, we propose a selective Jacobi decoding (SeJD) strategy that accelerates autoregressive inference through parallel iterative optimization. Theoretical analyses demonstrate the method's superlinear convergence rate and guarantee that the number of iterations required is no greater than the original sequential approach. Empirical evaluations across multiple datasets validate the generality and effectiveness of our acceleration technique. Experiments demonstrate substantial speed improvements up to 4.7 times faster inference while keeping the generation quality and fidelity.

Problem

Research questions and friction points this paper is trying to address.

Autoregressive normalizing flows suffer from slow sequential inference.

Strict sequential dependency is unnecessary for high-quality sample generation.

Existing models exhibit redundant dependencies across different layers.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective Jacobi decoding for parallel optimization

Reduces dependency redundancy in autoregressive layers

Superlinear convergence with guaranteed iteration bounds

🔎 Similar Papers

No similar papers found.