🤖 AI Summary
Standard diffusion models corrupt images with Gaussian noise, whose Fourier phase randomization severely degrades spatial structural consistency, limiting applicability in geometry-sensitive tasks such as relighting, simulation augmentation, and image translation. To address this, we propose Phase-Preserving Diffusion (φ-PD): an orthogonal, architecture-agnostic enhancement that explicitly preserves the input’s Fourier phase while injecting frequency-selective structural (FSS) noise solely into the magnitude spectrum. This enables structure-aligned generation without modifying model architecture or introducing additional parameters. A single cutoff-frequency hyperparameter continuously controls structural rigidity, ensuring seamless integration with mainstream image and video diffusion models—without inference overhead. Experiments demonstrate that φ-PD improves cross-domain planning performance by 50% on CARLA→Waymo, and significantly enhances spatial fidelity and geometric consistency in photon-level relighting and stylized image translation.
📝 Abstract
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.