Towards Understanding Domain Adapted Sentence Embeddings for Document Retrieval

📅 2024-06-18

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses domain adaptation of sentence embedding models for specialized-domain document retrieval (e.g., telecommunications), where standard fine-tuning often degrades performance. Method: We propose a novel evaluation metric—top-K similarity distribution overlap—to diagnose embedding misalignment; identify critical phenomena including increased intra-/inter-domain distribution shift and weak correlation between isotropy and retrieval accuracy; and introduce a two-stage pretraining framework with guided similarity threshold estimation, jointly optimizing S-IDF and MIS isotropy measures and employing bootstrapping for confidence interval analysis. Contribution/Results: Our approach significantly improves mean retrieval accuracy across domains and substantially narrows the 95% confidence interval. The proposed metric exhibits strong correlation with both retrieval accuracy and optimal similarity threshold. The study delivers a reusable, principled evaluation framework and practical guidelines for domain-adaptive sentence embedding in technical domains.

Technology Category

Application Category

📝 Abstract

A plethora of sentence embedding models makes it challenging to choose one, especially for technical domains rich with specialized vocabulary. In this work, we domain adapt embeddings using telecom, health and science datasets for question answering. We evaluate embeddings obtained from publicly available models and their domain-adapted variants, on both point retrieval accuracies, as well as their (95%) confidence intervals. We establish a systematic method to obtain thresholds for similarity scores for different embeddings. As expected, we observe that fine-tuning improves mean bootstrapped accuracies. We also observe that it results in tighter confidence intervals, which further improve when pre-training is preceded by fine-tuning. We introduce metrics which measure the distributional overlaps of top-$K$, correct and random document similarities with the question. Further, we show that these metrics are correlated with retrieval accuracy and similarity thresholds. Recent literature shows conflicting effects of isotropy on retrieval accuracies. Our experiments establish that the isotropy of embeddings (as measured by two independent state-of-the-art isotropy metric definitions) is poorly correlated with retrieval performance. We show that embeddings for domain-specific sentences have little overlap with those for domain-agnostic ones, and fine-tuning moves them further apart. Based on our results, we provide recommendations for use of our methodology and metrics by researchers and practitioners.

Problem

Research questions and friction points this paper is trying to address.

Choosing telecom-adapted sentence embeddings for technical document retrieval

Evaluating domain-adapted embeddings for QA accuracy and confidence intervals

Analyzing isotropy and distributional overlaps impact on retrieval performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain adapt embeddings using telecom data

Systematic method for similarity score thresholds

Metrics for distributional overlaps in document retrieval

🔎 Similar Papers

No similar papers found.