Do We Need Domain-Specific Embedding Models? An Empirical Investigation

📅 2024-09-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Generic embedding models exhibit uncharacterized performance degradation when applied to domain-specific tasks, particularly in finance, due to misalignment between general-purpose evaluation benchmarks and domain semantics. Method: To address the lack of domain-adaptive evaluation, we introduce FinMTEB—the first financial-domain-specific embedding benchmark—and propose four novel metrics quantifying semantic complexity in financial text. Contribution/Results: Empirical evaluation reveals that seven state-of-the-art generic embedding models suffer significant average performance drops on FinMTEB compared to their scores on the general-purpose MTEB benchmark. Crucially, model rankings on MTEB and FinMTEB are nearly uncorrelated (Pearson *r* ≈ 0.08), demonstrating poor transferability of generic embedding capabilities to domain-specific semantic modeling. This work constitutes the first systematic investigation into domain adaptation bottlenecks for embedding models, establishes the necessity of domain-specialized evaluation, and provides both a methodological framework and an empirical benchmark to guide the development of vertical-domain embedding models.

Technology Category

Application Category

📝 Abstract
Embedding models play a crucial role in representing and retrieving information across various NLP applications. Recent advancements in Large Language Models (LLMs) have further enhanced the performance of embedding models, which are trained on massive amounts of text covering almost every domain. These models are often benchmarked on general-purpose datasets like Massive Text Embedding Benchmark (MTEB), where they demonstrate superior performance. However, a critical question arises: Is the development of domain-specific embedding models necessary when general-purpose models are trained on vast corpora that already include specialized domain texts? In this paper, we empirically investigate this question, choosing the finance domain as an example. We introduce the Finance Massive Text Embedding Benchmark (FinMTEB), a counterpart to MTEB that consists of financial domain-specific text datasets. We evaluate the performance of seven state-of-the-art embedding models on FinMTEB and observe a significant performance drop compared to their performance on MTEB. To account for the possibility that this drop is driven by FinMTEB's higher complexity, we propose four measures to quantify dataset complexity and control for this factor in our analysis. Our analysis provides compelling evidence that state-of-the-art embedding models struggle to capture domain-specific linguistic and semantic patterns. Moreover, we find that the performance of general-purpose embedding models on MTEB is not correlated with their performance on FinMTEB, indicating the need for domain-specific embedding benchmarks for domain-specific embedding models. This study sheds light on developing domain-specific embedding models in the LLM era. FinMTEB comes with open-source code at https://github.com/yixuantt/FinMTEB
Problem

Research questions and friction points this paper is trying to address.

Evaluating domain-specific embedding models
Comparing general-purpose vs. finance-specific embeddings
Developing benchmarks for domain-specific NLP tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Finance Massive Text Embedding Benchmark
Proposed four dataset complexity measures
Evaluated embedding models on domain-specific texts
Y
Yixuan Tang
The Hong Kong University of Science and Technology
Y
Yi Yang
The Hong Kong University of Science and Technology