Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing Gaussianity regularization methods for text-to-image model latent spaces suffer from high computational overhead and lack permutation invariance. Method: We propose a novel dual-domain regularization framework jointly enforcing moment matching in the spatial domain and power spectral density matching in the frequency domain. Our loss is analytically derived from a Gaussian prior, ensuring permutation invariance under random input permutations and imposing spectral constraints with near-linear complexity—thereby preventing reward hacking. Contribution/Results: This work unifies moment-based and power-spectrum-based regularization within a single theoretically rigorous yet computationally efficient framework. In reward-driven downstream tasks—including aesthetic enhancement and text-alignment optimization—our method significantly outperforms existing Gaussianity regularizers, yielding higher-fidelity image generation and faster convergence.

Technology Category

Application Category

📝 Abstract

We propose a novel regularization loss that enforces standard Gaussianity, encouraging samples to align with a standard Gaussian distribution. This facilitates a range of downstream tasks involving optimization in the latent space of text-to-image models. We treat elements of a high-dimensional sample as one-dimensional standard Gaussian variables and define a composite loss that combines moment-based regularization in the spatial domain with power spectrum-based regularization in the spectral domain. Since the expected values of moments and power spectrum distributions are analytically known, the loss promotes conformity to these properties. To ensure permutation invariance, the losses are applied to randomly permuted inputs. Notably, existing Gaussianity-based regularizations fall within our unified framework: some correspond to moment losses of specific orders, while the previous covariance-matching loss is equivalent to our spectral loss but incurs higher time complexity due to its spatial-domain computation. We showcase the application of our regularization in generative modeling for test-time reward alignment with a text-to-image model, specifically to enhance aesthetics and text alignment. Our regularization outperforms previous Gaussianity regularization, effectively prevents reward hacking and accelerates convergence.

Problem

Research questions and friction points this paper is trying to address.

Enforcing standard Gaussianity in text-to-image model samples

Combining moment and power spectrum regularization approaches

Preventing reward hacking in generative model optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Moment-based regularization in spatial domain

Power spectrum-based regularization in spectral domain

Composite loss enforcing standard Gaussian distribution

🔎 Similar Papers

No similar papers found.