A Unified Structured Query Understanding Framework for Industrial Semantic Search

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenges of fragmented query understanding modules in industrial-scale semantic search, which lead to high maintenance costs and unstable performance on long-tail queries. The authors propose Query Illuminator, a unified structured query understanding framework that consolidates multiple tasks into a single small language model (SLM) and enables end-to-end processing through schema-constrained generation. The framework supports high-quality automatic annotation distillation and scalable evaluation under scarce labeled data, while facilitating cross-domain transfer. Deployed in LinkedIn’s job search system, Query Illuminator significantly improves user engagement and reduces operational overhead, achieving efficient inference under stringent latency constraints and limited GPU resources.

📝 Abstract

Query understanding in large-scale industrial search systems is typically implemented as a cascade of disparate, task-specific components. While individually optimizable, this fragmented architecture incurs high maintenance overhead and results in inconsistent behaviors, particularly for long-tail queries. In this work, we propose and deploy a unified structured query understanding system that consolidates these heterogeneous functions into a single Small Language Model (SLM) that performs schema-constrained generation. To address the data bottlenecks inherent in unified modeling, we introduce Query Illuminator, a dual-purpose framework serving as: (i) a teacher model for high-quality auto-annotation and distillation, and (ii) a surrogate judge for scalable evaluation where human labels are scarce. We validate this approach through extensive offline and online tests within LinkedIn's Job Search system. Furthermore, we demonstrate the framework's horizontal extensibility through a cross-domain case study on People Search. The results show improved user engagement and reduced operational costs, achieved while satisfying strict low-latency serving constraints on limited GPU resources.

Problem

Research questions and friction points this paper is trying to address.

query understanding

industrial semantic search

fragmented architecture

long-tail queries

maintenance overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified query understanding

structured generation

small language model