Set-Theoretic Compositionality of Sentence Embeddings

📅 2025-02-28

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the limitation that existing sentence encoder evaluations overly rely on downstream tasks and lack task-agnostic assessment of fundamental compositional operations—specifically set-theoretic ones (intersection, union, difference). To this end, we propose the first set-theory-based, interpretable compositional evaluation paradigm. Methodologically, we formally define text-level set operations—TextOverlap, TextDifference, and TextUnion—and construct a dedicated benchmark comprising 192,000 samples. We further design six white-box, decomposable set-theoretic criteria for quantitative evaluation. Experiments span seven traditional encoders and nine LLM-based encoders. Results show that SBERT significantly outperforms all LLM encoders across all six criteria. This work establishes a theoretical framework, standardized evaluation protocol, and open-source benchmark for studying compositional properties of sentence embeddings.

Technology Category

Application Category

📝 Abstract

Sentence encoders play a pivotal role in various NLP tasks; hence, an accurate evaluation of their compositional properties is paramount. However, existing evaluation methods predominantly focus on goal task-specific performance. This leaves a significant gap in understanding how well sentence embeddings demonstrate fundamental compositional properties in a task-independent context. Leveraging classical set theory, we address this gap by proposing six criteria based on three core"set-like"compositions/operations: extit{TextOverlap}, extit{TextDifference}, and extit{TextUnion}. We systematically evaluate $7$ classical and $9$ Large Language Model (LLM)-based sentence encoders to assess their alignment with these criteria. Our findings show that SBERT consistently demonstrates set-like compositional properties, surpassing even the latest LLMs. Additionally, we introduce a new dataset of ~$192$K samples designed to facilitate future benchmarking efforts on set-like compositionality of sentence embeddings.

Problem

Research questions and friction points this paper is trying to address.

Evaluate compositional properties of sentence embeddings

Propose set-theoretic criteria for task-independent evaluation

Assess sentence encoders using TextOverlap, TextDifference, TextUnion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes six criteria based on set theory

Evaluates sentence encoders using TextOverlap, TextDifference, TextUnion

Introduces a new dataset for benchmarking compositionality

🔎 Similar Papers

Geometric Signatures of Compositionality Across a Language Model's Lifetime