Stochastic Interpolants via Conditional Dependent Coupling

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Current image generation models struggle to balance computational efficiency and generation quality: VAEs suffer from information loss and limited end-to-end trainability; pixel-space diffusion models incur high computational overhead; and cascaded architectures face distribution mismatch, knowledge fragmentation, and difficulties in joint optimization due to their staged design. To address these limitations, we propose a unified multi-stage diffusion framework grounded in conditional dependency coupling. Our approach models image generation as a multi-step interpolation trajectory and implements it via a single Diffusion Transformer that enables cross-stage parameter sharing and end-to-end joint optimization. Leveraging stochastic interpolation and conditional coupling, the framework performs multi-scale modeling directly in pixel space. Experiments demonstrate that our method achieves high-fidelity generation across diverse resolutions while maintaining efficient inference—significantly outperforming state-of-the-art VAE- and cascade-based systems.

Technology Category

Application Category

📝 Abstract

Existing image generation models face critical challenges regarding the trade-off between computation and fidelity. Specifically, models relying on a pretrained Variational Autoencoder (VAE) suffer from information loss, limited detail, and the inability to support end-to-end training. In contrast, models operating directly in the pixel space incur prohibitive computational cost. Although cascade models can mitigate computational cost, stage-wise separation prevents effective end-to-end optimization, hampers knowledge sharing, and often results in inaccurate distribution learning within each stage. To address these challenges, we introduce a unified multistage generative framework based on our proposed Conditional Dependent Coupling strategy. It decomposes the generative process into interpolant trajectories at multiple stages, ensuring accurate distribution learning while enabling end-to-end optimization. Importantly, the entire process is modeled as a single unified Diffusion Transformer, eliminating the need for disjoint modules and also enabling knowledge sharing. Extensive experiments demonstrate that our method achieves both high fidelity and efficiency across multiple resolutions.

Problem

Research questions and friction points this paper is trying to address.

Addresses trade-offs between computational cost and image fidelity

Resolves information loss and limited detail in VAE-based models

Enables end-to-end optimization in multistage generative frameworks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multistage generative framework with Conditional Dependent Coupling

Unified Diffusion Transformer enabling end-to-end optimization

Interpolant trajectories ensuring accurate distribution learning

🔎 Similar Papers

Conditional Stochastic Interpolation for Generative Learning