$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

While existing Zigzag sampling improves semantic alignment, it incurs triple computational overhead and off-manifold truncation errors that induce distributional drift. This work proposes $Z^2$-Sampling, which establishes—for the first time—that Zigzag trajectories admit topological reduction, enabling an implicit sampling mechanism with zero additional computational cost. By leveraging implicit algebraic collapse and dynamic caching of temporal semantic proxies, $Z^2$-Sampling preserves multi-step trajectory curvature information within the standard 2-NFE budget. Theoretically, this approach is equivalent to imposing a directional derivative curvature penalty. Empirically, it substantially advances the performance–efficiency Pareto frontier in both image and video generation, demonstrating compatibility with diverse architectures such as U-Net and DiT, as well as alignment frameworks including AYS and Diffusion-DPO.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved unprecedented success in text-aligned generation, largely driven by Classifier-Free Guidance (CFG). However, standard CFG operates strictly on instantaneous gradients, omitting the intrinsic curvature of the data manifold. Recent methods like Zigzag-sampling (Z-Sampling) explicitly traverse multi-step forward-backward trajectories to probe this curvature, significantly improving semantic alignment. Yet, these explicit traversals triple the Neural Function Evaluation (NFE) cost and introduce unconstrained truncation errors from off-manifold evaluations, causing cumulative drift from the true marginal distribution. In this paper, we theoretically demonstrate that the explicit zigzag sequence is topologically reducible. We propose Implicit Z-Sampling, rigorously proving that intermediate states can be algebraically annihilated via operator dualities, physically eliminating off-manifold approximation errors. To push sampling efficiency to its theoretical lower bound, we introduce $Z^2$-Sampling (Zero-cost Zigzag Sampling). Exploiting the Probability Flow ODE's temporal coherence, $Z^2$-Sampling couples implicit algebraic collapse with a dynamically cached Temporal Semantic Surrogate. This restores the standard 2-NFE baseline without sacrificing semantic exploration. We formally prove via Backward Error Analysis that this discrete collapse inherently synthesizes a directional derivative curvature penalty. Finally, extensive evaluations demonstrate that $Z^2$-Sampling structurally shatters the performance-efficiency Pareto frontier. We validate its universal applicability across diverse architectures (U-Nets, DiTs) and modalities (image/video), establishing seamless orthogonality with advanced alignment frameworks (AYS, Diffusion-DPO).

Problem

Research questions and friction points this paper is trying to address.

semantic alignment

diffusion models

zigzag sampling

off-manifold errors

sampling efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Z^2-Sampling

Implicit Z-Sampling

semantic alignment