Large Language Model-Powered Query-Driven Event Timeline Summarization in Industrial Search

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the limitations of traditional topic-centric event timeline construction methods, which struggle to precisely focus on user-query-relevant subevents and suffer from inefficiency amid massive noisy documents. To this end, we propose QDET, a query-driven timeline generation system tailored for industrial-scale search engines. QDET enhances the domain-specific capabilities of a small language model through multi-task supervised fine-tuning—encompassing temporal ordering, causal judgment, and timeline completion—and integrates reinforcement learning with constrained decoding to produce high-quality summaries under strict length limits. Deployed with a 7B-parameter large language model integrated into a live search system, QDET achieves an F1 score of 76.2%, surpassing the zero-shot performance of a 671B-parameter model. Online A/B tests demonstrate significant improvements: a 5.5% increase in click-through rate, 4.6% longer dwell time, and 4.4% deeper exploration depth. Moreover, 88.2% of generated summaries adhere to the length constraint, outperforming baselines by 7.7 percentage points in constraint satisfaction rate.

📝 Abstract

Understanding how events evolve over time is essential for search engines handling queries about trending news. We present QDET (Query-Driven Event Timeline Summarization), a production system deployed on Baidu Search that constructs focused event timelines to explain specific query events. Unlike traditional topic-centric approaches that aim for comprehensive coverage, QDET identifies and organizes sub-events closely relevant to the query from noisy candidate sets formed by millions of documents retrieved daily. QDET incorporates two key innovations: (1) multi-task supervised fine-tuning with three auxiliary tasks-temporal ordering, causal judgment, and timeline completion-that enable compact models to match the performance of much larger general-purpose models in specialized domains; (2) reinforcement learning-based event concise summarization that enforces strict length constraints while maintaining semantic quality, achieving 88.2% length compliance and outperforming 671B-scale models by 7.7 points in constraint satisfaction. Our fine-tuned 7B parameter model achieves 76.2% F1 score on timeline summarization, slightly surpassing the zero-shot performance of DeepSeek-R1-671B (76.1% F1) while using only 1% of its parameters-demonstrating that domain-specific optimization enables production-ready models with comparable quality at drastically reduced computational costs. Online A/B tests on Baidu Search validate real-world effectiveness, showing 5.5% CTR improvement, 4.6% longer dwell time, and 4.4% deeper exploration compared to single-task baselines. We further demonstrate that timeline understanding transfers to heat prediction, confirming effective knowledge transfer to downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

event timeline summarization

query-driven

industrial search

large language models

temporal understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

query-driven summarization

multi-task fine-tuning

reinforcement learning for summarization