NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard diffusion models corrupt images with Gaussian noise, whose Fourier phase randomization severely degrades spatial structural consistency, limiting applicability in geometry-sensitive tasks such as relighting, simulation augmentation, and image translation. To address this, we propose Phase-Preserving Diffusion (φ-PD): an orthogonal, architecture-agnostic enhancement that explicitly preserves the input’s Fourier phase while injecting frequency-selective structural (FSS) noise solely into the magnitude spectrum. This enables structure-aligned generation without modifying model architecture or introducing additional parameters. A single cutoff-frequency hyperparameter continuously controls structural rigidity, ensuring seamless integration with mainstream image and video diffusion models—without inference overhead. Experiments demonstrate that φ-PD improves cross-domain planning performance by 50% on CARLA→Waymo, and significantly enhances spatial fidelity and geometric consistency in photon-level relighting and stylized image translation.

Technology Category

Application Category

📝 Abstract
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.
Problem

Research questions and friction points this paper is trying to address.

Standard diffusion destroys spatial structure by corrupting phase components
This makes it unsuitable for tasks requiring geometric consistency
Such as re-rendering, simulation enhancement, and image-to-image translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Phase-Preserving Diffusion preserves input phase while randomizing magnitude.
Frequency-Selective Structured noise enables control over structural rigidity.
Model-agnostic method adds no inference cost and is widely applicable.
🔎 Similar Papers
No similar papers found.
Y
Yu Zeng
Toyota Research Institute
C
Charles Ochoa
Toyota Research Institute
M
Mingyuan Zhou
University of Texas, Austin
Vishal M. Patel
Vishal M. Patel
Associate Professor, ECE, Johns Hopkins University
Image ProcessingComputer VisionBiometricsMedical Image AnalysisRemote Sensing
V
Vitor Guizilini
Toyota Research Institute
Rowan McAllister
Rowan McAllister
Waymo
Reinforcement LearningRoboticsMachine LearningAutonomous Vehicles