FLARE: Task-agnostic embedding model evaluation through a normalization process

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the instability of existing unsupervised embedding evaluation methods in high-dimensional spaces when task-specific labels are unavailable. The authors propose FLARE, a novel label-free evaluation framework that leverages normalizing flows to directly model log-likelihood as a proxy for information sufficiency, thereby circumventing conventional distance-based density estimation. Notably, this is the first application of normalizing flows to embedding model evaluation. Theoretical analysis demonstrates that FLARE’s estimation error depends solely on the intrinsic dimensionality of the data manifold, not the ambient embedding dimension. Empirical results across 11 datasets and 8 embedding models show that FLARE achieves a Spearman correlation of 0.90 with supervised ground-truth rankings and significantly outperforms existing unsupervised baselines—particularly in high-dimensional settings (d ≥ 3,584)—while delivering stable and reliable assessments.

Technology Category

Application Category

📝 Abstract
When task-specific labels are not available, it becomes difficult to select an embedding model for a specific target corpus. Existing labelless measures based on kernel estimators or Gaussian mixes fail in high-dimensional space, resulting in unstable rankings. We propose a flow-based labelless representation embedding evaluation (FLARE), which utilizes normalized streams to estimate information sufficiency directly from log-likelihood and avoid distance-based density estimation. We give a finite sample boundary, indicating that the estimation error depends on the intrinsic dimension of the data manifold rather than the original embedding dimension. On 11 datasets and 8 embedders, FLARE reached Spearman's $ρ$ of 0.90 under the supervised benchmark and remained stable in high-dimensional embeddings ($d \geq 3{,}584$) as the existing labelless baseline collapsed.
Problem

Research questions and friction points this paper is trying to address.

embedding model evaluation
labelless evaluation
high-dimensional embeddings
representation learning
model selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-based model
label-free evaluation
embedding assessment
intrinsic dimension
normalizing flows
🔎 Similar Papers
No similar papers found.