Fine-tuning Small Language Models as Efficient Enterprise Search Relevance Labelers

📅 2026-01-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of high-quality relevance annotations in enterprise search, which hinders effective model training. To overcome this challenge, the authors propose an efficient approach leveraging synthetic data generation and knowledge distillation. Specifically, they employ a large language model (LLM) to generate queries, use BM25 to retrieve hard negative samples, and construct a high-quality synthetic dataset by having the teacher LLM score query-document pairs. A small language model (SLM) is then fine-tuned via knowledge distillation to serve as a lightweight relevance annotator. Evaluated on 923 human-annotated enterprise query-document pairs, the distilled SLM achieves human-level or superior agreement with human judgments compared to the teacher LLM, while delivering a 17× higher inference throughput and reducing cost to 1/19 of the original—demonstrating a compelling balance between performance and efficiency.

Technology Category

Application Category

📝 Abstract
In enterprise search, building high-quality datasets at scale remains a central challenge due to the difficulty of acquiring labeled data. To resolve this challenge, we propose an efficient approach to fine-tune small language models (SLMs) for accurate relevance labeling, enabling high-throughput, domain-specific labeling comparable or even better in quality to that of state-of-the-art large language models (LLMs). To overcome the lack of high-quality and accessible datasets in the enterprise domain, our method leverages on synthetic data generation. Specifically, we employ an LLM to synthesize realistic enterprise queries from a seed document, apply BM25 to retrieve hard negatives, and use a teacher LLM to assign relevance scores. The resulting dataset is then distilled into an SLM, producing a compact relevance labeler. We evaluate our approach on a high-quality benchmark consisting of 923 enterprise query-document pairs annotated by trained human annotators, and show that the distilled SLM achieves agreement with human judgments on par with or better than the teacher LLM. Furthermore, our fine-tuned labeler substantially improves throughput, achieving 17 times increase while also being 19 times more cost-effective. This approach enables scalable and cost-effective relevance labeling for enterprise-scale retrieval applications, supporting rapid offline evaluation and iteration in real-world settings.
Problem

Research questions and friction points this paper is trying to address.

enterprise search
relevance labeling
labeled data scarcity
dataset construction
information retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

small language models
synthetic data generation
relevance labeling
knowledge distillation
enterprise search
🔎 Similar Papers
No similar papers found.
Y
Yue Kang
Microsoft
Z
Zhuoyi Huang
Microsoft
B
Benji Schussheim
Microsoft
D
Diana Licon
Microsoft
D
Dina Atia
Microsoft
S
Shixing Cao
Microsoft
J
Jacob Danovitch
Microsoft
Kunho Kim
Kunho Kim
KAIST
Computer VisionComputer Graphics
B
Billy Norcilien
Microsoft
J
Jonah Karpman
Microsoft
M
Mahmound Sayed
Microsoft
M
Mike Taylor
Microsoft
T
Tao Sun
Microsoft
Pavel Metrikov
Pavel Metrikov
Microsoft
Machine LearningInformation RetrievalStatistical Data AnalysisSponsored SearchGeo-informatics
V
Vipul Agarwal
Amazon
Chris Quirk
Chris Quirk
Researcher, Microsoft Research
Natural Language ProcessingMachine TranslationComputational Linguistics
Y
Ye-Yi Wang
Microsoft
Nick Craswell
Nick Craswell
Microsoft
Information RetrievalWeb SearchConversational AI
I
Irene Shaffer
Microsoft
T
Tianwei Chen
Microsoft
Sulaiman Vesal
Sulaiman Vesal
Microsoft
Deep LearningMachine LearningLLM/SLMVLM
S
Soundar Srinivasan
Microsoft