🤖 AI Summary
Diffusion models suffer from the curse of dimensionality in high-dimensional generative modeling, leading to prohibitively high sample complexity for global score function estimation. To address this, we propose a **localized diffusion modeling framework**, the first to explicitly incorporate a locality structural assumption—constraining score learning to input neighborhoods—and theoretically prove that it circumvents the curse of dimensionality. Methodologically, we design a localized score matching loss, characterizing the fundamental trade-off between statistical error and localization bias; we further introduce localized neural architectures and a parallelizable training scheme. Experiments demonstrate that moderate localization radii significantly improve generative quality and generalization under finite-sample regimes, while enabling efficient parallel training. Our core contribution is the first theoretically grounded, locally structured paradigm for high-dimensional diffusion modeling.
📝 Abstract
Diffusion models are the state-of-the-art tools for various generative tasks. However, estimating high-dimensional score functions makes them potentially suffer from the curse of dimensionality (CoD). This underscores the importance of better understanding and exploiting low-dimensional structure in the target distribution. In this work, we consider locality structure, which describes sparse dependencies between model components. Under locality structure, the score function is effectively low-dimensional, so that it can be estimated by a localized neural network with significantly reduced sample complexity. This motivates the localized diffusion model, where a localized score matching loss is used to train the score function within a localized hypothesis space. We prove that such localization enables diffusion models to circumvent CoD, at the price of additional localization error. Under realistic sample size scaling, we show both theoretically and numerically that a moderate localization radius can balance the statistical and localization error, leading to a better overall performance. The localized structure also facilitates parallel training of diffusion models, making it potentially more efficient for large-scale applications.