🤖 AI Summary
This work proposes LENS, an efficient noise modulation framework designed to reduce the number of denoising steps in diffusion models without compromising image quality or incurring high computational costs during test-time optimization. The method theoretically and empirically demonstrates, for the first time, that selectively modulating noise only within a low-dimensional, low-frequency subspace is sufficient to preserve the global structure and visual fidelity of generated images. Leveraging a lightweight hypernetwork for targeted noise modulation, LENS substantially reduces both model parameters and computational overhead. Experimental results show that, compared to existing approaches, LENS achieves 400–700× lower FLOPs, 25–75× fewer parameters, and 10–20× reduced inference cost, while maintaining competitive generation quality.
📝 Abstract
Distilled diffusion models accelerate image generation by reducing the number of denoising steps, but often suffer from degraded image quality. To mitigate this trade-off, test-time optimization methods improve quality, yet their iterative nature incurs substantial computational overhead and leads to slow inference, limiting practical usability. Recent hypernetwork-based approaches amortize this process during training, but still require costly noise modulation in high-dimensional latent spaces. In this work, we propose LENS (Low-frequency Eigen Noise Shaping), an efficient noise modulation framework that operates in a low-dimensional subspace. Our approach is motivated by the observation that low-frequency components of the noise largely determine the global structure and visual fidelity of generated images. Based on this observation, we provide a theoretical justification for restricting modulation to the low-frequency subspace and derive a principled training objective. Building on this, LENS employs a lightweight, standalone network to selectively modulate these components, enabling efficient and targeted noise modulation. Extensive experiments demonstrate that LENS achieves competitive image quality while reducing FLOPs by 400-700$\times$, model parameters by 25-75$\times$, and inference-time overhead by 10-20$\times$ compared to prior methods.