Capturing Conditional Dependence via Auto-regressive Diffusion Models

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

While diffusion models excel in image/video generation, they struggle to capture high-level semantic conditional dependencies—such as physical laws or object stability—due to their implicit, global modeling of data. Method: We propose Autoregressive Diffusion Models (AR-DM), which explicitly encode conditional dependencies among data tokens via an autoregressive structure over latent diffusion steps. Contribution/Results: Theoretically, we derive the first sampling error upper bound for AR-DM and prove, under minimal data assumptions, that it achieves strictly better approximation of conditional distributions than standard DDPM; crucially, we identify the existence of conditional dependencies as the fundamental performance boundary. Empirically, AR-DM significantly outperforms DDPM on datasets with explicit conditional structures, matches its performance on dependency-free tasks—demonstrating robustness—and incurs only moderate inference overhead. This work establishes a new paradigm and theoretical foundation for enhancing the high-order semantic modeling capability of diffusion models.

Technology Category

Application Category

📝 Abstract

Diffusion models have demonstrated appealing performance in both image and video generation. However, many works discover that they struggle to capture important, high-level relationships that are present in the real world. For example, they fail to learn physical laws from data, and even fail to understand that the objects in the world exist in a stable fashion. This is due to the fact that important conditional dependence structures are not adequately captured in the vanilla diffusion models. In this work, we initiate an in-depth study on strengthening the diffusion model to capture the conditional dependence structures in the data. In particular, we examine the efficacy of the auto-regressive (AR) diffusion models for such purpose and develop the first theoretical results on the sampling error of AR diffusion models under (possibly) the mildest data assumption. Our theoretical findings indicate that, compared with typical diffusion models, the AR variant produces samples with a reduced gap in approximating the data conditional distribution. On the other hand, the overall inference time of the AR-diffusion models is only moderately larger than that for the vanilla diffusion models, making them still practical for large scale applications. We also provide empirical results showing that when there is clear conditional dependence structure in the data, the AR diffusion models captures such structure, whereas vanilla DDPM fails to do so. On the other hand, when there is no obvious conditional dependence across patches of the data, AR diffusion does not outperform DDPM.

Problem

Research questions and friction points this paper is trying to address.

Enhancing diffusion models to capture conditional dependence structures

Evaluating auto-regressive diffusion models for improved conditional distribution approximation

Comparing AR-diffusion and vanilla models in capturing high-level data relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Auto-regressive diffusion models capture conditional dependence

AR diffusion reduces sampling error theoretically

AR diffusion maintains practical inference time

🔎 Similar Papers

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data