Oracle Noise: Faster Semantic Spherical Alignment for Interpretable Latent Optimization

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the limitations of conventional Euclidean latent noise optimization in text-to-image diffusion models, which often suffers from norm explosion, violation of the Gaussian prior, and consequent artifacts that impair semantic alignment efficiency. The authors propose reframing noise initialization as a semantic-driven optimization on a Riemannian hypersphere: by identifying, in a zero-shot manner, the most structurally influential keyword from the input prompt, gradient updates are performed along the spherical manifold. This approach strictly preserves the Gaussian prior while enabling efficient and interpretable latent optimization. Notably, it is the first method to constrain latent noise to a spherical manifold, eliminating reliance on black-box proxy models and allowing large-step, rapid convergence. Experiments demonstrate that the method achieves state-of-the-art performance on HPSv2, ImageReward, CLIP Score, and diversity metrics, completely avoiding image degradation while accelerating semantic alignment within two seconds.

Technology Category

Application Category

📝 Abstract

Text-to-image diffusion models have achieved remarkable generative capabilities, yet accurately aligning complex textual prompts with synthesized layouts remains an ongoing challenge. In these models, the initial Gaussian noise acts as a critical structural seed dictating the macroscopic layout. Recent online optimization and search methods attempt to refine this noise to enhance text-image alignment. However, relying on unconstrained Euclidean gradient ascent mathematically inflates the latent norm and destroys the standard Gaussian prior, causing severe visual artifacts like color over-saturation. Furthermore, these methods suffer from inefficient semantic routing and easily fall into the ``reward hacking'' trap of external proxy models. To address these intertwined bottlenecks, we propose Oracle Noise, a zero-shot framework reframing noise initialization as semantic-driven optimization strictly confined to a Riemannian hypersphere. Instead of relying on complex external parsers, we directly identify the most impactful structural words in the prompt to efficiently route optimization energy. By updating the noise strictly along a spherical path, we mathematically preserve the original Gaussian distribution. This geometric constraint eliminates norm inflation and unlocks aggressive step sizes for rapid convergence. Extensive experiments demonstrate that Oracle Noise significantly accelerates semantic alignment and achieves superior aesthetics without black-box models. It completely mitigates Euclidean-induced degradation, establishing state-of-the-art performance across human preference metrics (e.g., HPSv2, ImageReward), semantic alignment (CLIP Score), and sample diversity, all within a strict 2-second optimization budget.

Problem

Research questions and friction points this paper is trying to address.

text-to-image alignment

latent optimization

Gaussian prior distortion

semantic routing

reward hacking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Oracle Noise

Riemannian hypersphere

semantic alignment