RAG-Enhanced Large Language Models for Dynamic Content Expiration Prediction in Web Search

๐Ÿ“… 2026-05-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

179K/year
๐Ÿค– AI Summary
Traditional web search relies on static time-window filtering, which often fails to align user intent with the semantic freshness of content, frequently returning results that are temporally recent yet semantically outdated. This work proposes the first query-aware dynamic expiration prediction framework tailored for industrial-scale search systems, formulating timeliness modeling as a large language model (LLM)-driven validity reasoning task. By extracting fine-grained temporal context from documents, the framework infers query-specific โ€œvalidity boundariesโ€ to determine when information becomes obsolete due to semantic shifts. The approach integrates retrieval-augmented generation (RAG), query-aware reasoning, and hallucination suppression mechanisms. Deployed in Baidu Search, it demonstrates significant improvements in result freshness and user experience, as validated by both offline evaluations and online A/B tests.
๐Ÿ“ Abstract
In commercial web search, aligning content freshness with user intent remains challenging due to the highly varied lifespans of information. Traditional industrial approaches rely on static time-window filtering, resulting in "one-size-fits-all" rankings where content may be chronologically recent but semantically expired. To address the limitation, we present a novel Large Language Models (LLMs)-based Query-Aware Dynamic Content Expiration Prediction Framework deployed in Baidu search, reformulating timeliness as a dynamic validity inference task. Our framework extracts fine-grained temporal contexts from documents and leverages LLMs to deduce a query-specific "validity horizon"-a semantic boundary defining when information becomes obsolete based on user intent. Integrated with robust hallucination mitigation strategies to ensure reliability, our approach has been evaluated through offline and online A/B testing on live production traffic. Results demonstrate significant improvements in search freshness and user experience metrics, validating the effectiveness of LLM-driven reasoning for solving semantic expiration at an industrial scale.
Problem

Research questions and friction points this paper is trying to address.

content expiration
web search
semantic freshness
user intent
timeliness
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG
Large Language Models
Dynamic Content Expiration
Query-Aware Timeliness
Validity Horizon
๐Ÿ”Ž Similar Papers