Semantic Search At LinkedIn

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work proposes a semantic retrieval framework for LinkedIn’s job and talent search, designed to replace traditional keyword matching with large language model (LLM)-driven semantic search under strict latency constraints. The framework integrates LLM-based relevance scoring, embedding-based retrieval, and a compact student model trained via multi-teacher distillation. It further introduces a prefill-guided inference architecture augmented with model pruning, context compression, and a hybrid text-embedding interaction mechanism. The resulting system achieves over 75× throughput improvement while preserving near-teacher-model NDCG performance, establishing one of the first industry-scale LLM-powered semantic ranking systems that simultaneously delivers high efficiency, scalability, and significantly enhanced search quality and user engagement.

Technology Category

Application Category

📝 Abstract

Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.

Problem

Research questions and friction points this paper is trying to address.

semantic search

large language models

inference efficiency

relevance ranking

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic search

large language models

model distillation