HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing text-to-image alignment evaluation methods predominantly rely on Euclidean space metrics, which struggle to capture semantic hierarchies and lack sample adaptivity. This work proposes the first adaptive alignment evaluation framework grounded in hyperbolic entailment geometry: CLIP features are mapped into hyperbolic space, where hyperbolic entailment cones model semantic inclusion relationships. A dynamic supervision mechanism coupled with an adaptive modulation regressor enables structure-aware, sample-level alignment score prediction. The proposed method achieves state-of-the-art performance in both in-domain evaluation and cross-dataset generalization, demonstrating the efficacy and advantages of hyperbolic geometry for assessing text–image alignment.

Technology Category

Application Category

πŸ“ Abstract
With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing methods rely on Euclidean space metrics, neglecting the structured nature of semantic alignment, while lacking adaptive capabilities for different samples. To address these limitations, we propose HyperAlign, an adaptive text-to-image alignment assessment framework based on hyperbolic entailment geometry. First, we extract Euclidean features using CLIP and map them to hyperbolic space. Second, we design a dynamic-supervision entailment modeling mechanism that transforms discrete entailment logic into continuous geometric structure supervision. Finally, we propose an adaptive modulation regressor that utilizes hyperbolic geometric features to generate sample-level modulation parameters, adaptively calibrating Euclidean cosine similarity to predict the final score. HyperAlign achieves highly competitive performance on both single database evaluation and cross-database generalization tasks, fully validating the effectiveness of hyperbolic geometric modeling for image-text alignment assessment.
Problem

Research questions and friction points this paper is trying to address.

text-to-image alignment
alignment assessment
semantic structure
adaptive evaluation
Euclidean metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperbolic geometry
entailment cones
adaptive alignment assessment
text-to-image generation
geometric representation learning
πŸ”Ž Similar Papers
No similar papers found.