π€ AI Summary
This work addresses the document-level query auto-completion (DocQAC) task by proposing an adaptive prefix tree-guided decoding framework. The approach dynamically guides encoder-decoder modelsβsuch as T5 and BARTβby integrating user input prefixes with lightweight document signals, including titles, keywords, and abstracts, to generate precise completions. It further introduces an adaptive penalty mechanism governed by tunable hyperparameters, enabling a principled trade-off between language model confidence and prefix tree constraints. As the first systematic study of document-level query auto-completion, the proposed method substantially outperforms strong baselines on a newly constructed DocQAC benchmark and even surpasses larger instruction-tuned models like LLaMA-3 and Phi-3, demonstrating consistently superior performance on both seen and unseen documents.
π Abstract
Query auto-completion (QAC) has been widely studied in the context of web search, yet remains underexplored for in-document search, which we term DocQAC. DocQAC aims to enhance search productivity within long documents by helping users craft faster, more precise queries, even for complex or hard-to-spell terms. While global historical queries are available to both WebQAC and DocQAC, DocQAC uniquely accesses document-specific context, including the current document's content and its specific history of user query interactions.
To address this setting, we propose a novel adaptive trie-guided decoding framework that uses user query prefixes to softly steer language models toward high-quality completions. Our approach introduces an adaptive penalty mechanism with tunable hyperparameters, enabling a principled trade-off between model confidence and trie-based guidance. To efficiently incorporate document context, we explore retrieval-augmented generation (RAG) and lightweight contextual document signals such as titles, keyphrases, and summaries.
When applied to encoder-decoder models like T5 and BART, our trie-guided framework outperforms strong baselines and even surpasses much larger instruction-tuned models such as LLaMA-3 and Phi-3 on seen queries across both seen and unseen documents. This demonstrates its practicality for real-world DocQAC deployments, where efficiency and scalability are critical. We evaluate our method on a newly introduced DocQAC benchmark derived from ORCAS, enriched with query-document pairs. We make both the DocQAC dataset (https://bit.ly/3IGEkbH) and code (https://github.com/rahcode7/DocQAC) publicly available.