🤖 AI Summary
Modeling high-dimensional complex distributions—such as images and climate data—remains challenging due to their strong nonlinearity and high inter-variable correlations. To address this, we propose the Inverse Markov Learning framework: a generic forward process first maps the target distribution to a known simple distribution; then, a multi-step engression procedure models the inverse reconstruction process. This work introduces the first extension of engression to multi-step inverse Markov chains, enabling arbitrary forward mechanisms, implicit dimensionality reduction, and natural discretization—establishing a novel, efficient, and differentiable paradigm for discrete-time training of diffusion models. Key design elements include score-based generative guidance, joint mapping of noise and covariates, and end-to-end optimization. Experiments on synthetic benchmarks and real-world climate datasets demonstrate substantial improvements in distribution fidelity and sample quality, validating the framework’s superiority in modeling highly nonlinear, strongly correlated, and high-dimensional distributions.
📝 Abstract
Learning complex distributions is a fundamental challenge in contemporary applications. Generative models, such as diffusion models, have demonstrated remarkable success in overcoming many limitations of traditional statistical methods. Shen and Meinshausen (2024) introduced engression, a generative approach based on scoring rules that maps noise (and covariates, if available) directly to data. While effective, engression struggles with highly complex distributions, such as those encountered in image data. In this work, we extend engression to improve its capability in learning complex distributions. We propose a framework that defines a general forward process transitioning from the target distribution to a known distribution (e.g., Gaussian) and then learns a reverse Markov process using multiple engression models. This reverse process reconstructs the target distribution step by step. Our approach supports general forward processes, allows for dimension reduction, and naturally discretizes the generative process. As a special case, when using a diffusion-based forward process, our framework offers a method to discretize the training and inference of diffusion models efficiently. Empirical evaluations on simulated and climate data validate our theoretical insights, demonstrating the effectiveness of our approach in capturing complex distributions.