HEART: Hyperspherical Embedding Alignment via Kent-Representation Traversal in Diffusion Models

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
Existing text-to-image diffusion models struggle to precisely control generated content when relying solely on textual conditions, often producing unintended editing artifacts due to neglecting the true geometric structure of the embedding space. This work proposes the HEART framework, which reveals for the first time that text embeddings lie on a hypersphere and follow a Kent distribution. Building upon this insight, the authors introduce a geometry-aware editing method that requires no fine-tuning, inversion, or optimization. By leveraging geodesic transformations on the hypersphere, the approach enables high-fidelity subject replacement and fine-grained attribute manipulation while preserving semantic consistency. It overcomes the limitations of conventional linear assumptions, demonstrates cross-model generalizability, and significantly enhances both the accuracy and efficiency of image editing.
📝 Abstract
Text-to-image diffusion models can generate visually stunning images, yet, controlling what appears and how it appears, remains surprisingly difficult, especially when operating solely within the constraints of the text-conditioning space. For example, changing a subject or adjusting an attribute often leads to unintended side effects, such as altered backgrounds or distorted details. This is because most existing text-based control methods treat the embedding space as Euclidean and apply simple linear transformations, which do not reflect how semantic concepts are actually organized. In this work, we take a step back and ask: what is the true geometry of these embeddings? We find that text encoder representations lie on a hypersphere, where concepts are not linear directions but structured, anisotropic distributions better captured by Kent distributions. Building on this insight, we propose HEART, a training-free framework that performs Kent-aware geodesic transformations directly on the hypersphere. By respecting the underlying geometry, HEART enables intuitive and precise edits, such as consistent subject replacement and fine-grained attribute control, while preserving the original scene. Importantly, HEART requires no finetuning, inversion, or optimization, and generalizes across diffusion model architectures. Our results show that a simple shift in perspective, from linear to spherical, can unlock fast, and controllable image generation.
Problem

Research questions and friction points this paper is trying to address.

text-to-image generation
embedding geometry
semantic control
diffusion models
hyperspherical representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperspherical embedding
Kent distribution
geodesic transformation
training-free editing
diffusion models
🔎 Similar Papers
No similar papers found.