π€ AI Summary
This work proposes a semantic retrieval framework for LinkedInβs job and talent search, designed to replace traditional keyword matching with large language model (LLM)-driven semantic search under strict latency constraints. The framework integrates LLM-based relevance scoring, embedding-based retrieval, and a compact student model trained via multi-teacher distillation. It further introduces a prefill-guided inference architecture augmented with model pruning, context compression, and a hybrid text-embedding interaction mechanism. The resulting system achieves over 75Γ throughput improvement while preserving near-teacher-model NDCG performance, establishing one of the first industry-scale LLM-powered semantic ranking systems that simultaneously delivers high efficiency, scalability, and significantly enhanced search quality and user engagement.
π Abstract
Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.