Conjuring Semantic Similarity

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in text generation evaluation: (1) the misalignment between conventional semantic similarity metrics and human judgments, and (2) the lack of interpretability in assessing generated text quality. To this end, we propose a novel cross-modal semantic similarity paradigm: sentences are mapped to their implicitly induced image distributions—conditioned on text—and similarity is quantified via the Jensen–Shannon divergence between these distributions. Methodologically, we formalize semantic similarity as the distance between text-conditioned implicit image distributions and leverage reverse diffusion stochastic differential equations (SDEs) to enable efficient, differentiable estimation—without explicit image generation. Evaluated on multiple standard benchmarks, our approach achieves strong correlation with human annotations (Pearson > 0.85), significantly outperforming state-of-the-art text embedding and paraphrase-based baselines. This work establishes the first generative-prior-based, physically grounded, and interpretable cross-modal metric framework for text generation evaluation.

Technology Category

Application Category

📝 Abstract
The semantic similarity between sample expressions measures the distance between their latent 'meaning'. Such meanings are themselves typically represented by textual expressions, often insufficient to differentiate concepts at fine granularity. We propose a novel approach whereby the semantic similarity among textual expressions is based not on other expressions they can be rephrased as, but rather based on the imagery they evoke. While this is not possible with humans, generative models allow us to easily visualize and compare generated images, or their distribution, evoked by a textual prompt. Therefore, we characterize the semantic similarity between two textual expressions simply as the distance between image distributions they induce, or 'conjure.' We show that by choosing the Jensen-Shannon divergence between the reverse-time diffusion stochastic differential equations (SDEs) induced by each textual expression, this can be directly computed via Monte-Carlo sampling. Our method contributes a novel perspective on semantic similarity that not only aligns with human-annotated scores, but also opens up new avenues for the evaluation of text-conditioned generative models while offering better interpretability of their learnt representations.
Problem

Research questions and friction points this paper is trying to address.

Semantic Similarity
Subjective Judgment
Sentence Quality Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic Similarity
Imagery Comparison
Jensen-Shannon Divergence
🔎 Similar Papers
No similar papers found.