π€ AI Summary
This work proposes a two-stage conditional generative framework tailored for settings with scarce paired data. The approach first leverages both paired and unpaired data to learn a low-dimensional latent space, then performs joint distribution matching exclusively on the paired data within this space using the 1-Wasserstein distance, enabling efficient one-step generation. Notably, this is the first study to incorporate unpaired data into semi-supervised conditional generative modeling, substantially improving geometric fidelity. The framework further provides a unified statistical perspective and theoretical foundation for latent-space methods such as Latent Diffusion Models. Theoretical analysis includes non-asymptotic error bounds and their connection to score matching, while experiments on class-conditional generation and image super-resolution tasks demonstrate the methodβs effectiveness and superiority.
π Abstract
We introduce Latent Space Distribution Matching (LSDM), a novel framework for semi-supervised generative modeling of conditional distributions. LSDM operates in two stages: (i) learning a low-dimensional latent space from both paired and unpaired data, and (ii) performing joint distribution matching in this space via the 1-Wasserstein distance, using only paired data. This two-step approach minimizes an upper bound on the 1-Wasserstein distance between joint distributions, reducing reliance on scarce paired samples while enabling fast one-step generation. Theoretically, we establish non-asymptotic error bounds and demonstrate a key benefit of unpaired data: enhanced geometric fidelity in generated outputs. Furthermore, by extending the scope of its two core steps, LSDM provides a coherent statistical perspective that connects to a broad class of latent-space approaches. Notably, Latent Diffusion Models (LDMs) can be viewed as a variant of LSDM, in which joint distribution matching is achieved indirectly via score matching. Consequently, our results also provide theoretical insights into the consistency of LDMs. Empirical evaluations on real-world image tasks, including class-conditional generation and image super-resolution, demonstrate the effectiveness of LSDM in leveraging unpaired data to enhance generation quality.