🤖 AI Summary
This work addresses the challenge of ambiguous user intent in multi-category marketplaces, where sparse contextual signals—such as queries like “Wildflower”—lead existing approaches to suffer from single-label bias or large language model hallucinations. The authors propose a decoupled agent architecture that integrates staged product catalog retrieval with autonomously triggered web search to generate an ordered set of multiple candidate intents. A configurable disambiguation layer then incorporates business logic to resolve ambiguities. This system is the first to combine agent-driven web search with catalog grounding for query understanding, enabling seamless integration of proprietary data sources and rules across arbitrary marketplaces. Deployed on DoorDash, it covers over 95% of daily search impressions, achieving an overall accuracy of 90.7%—a 13.0 percentage point improvement over non-grounded LLMs. Ablation studies show that catalog grounding, web search, and dual-intent disambiguation contribute +8.3pp, +3.2pp, and +1.5pp respectively on long-tail queries.
📝 Abstract
Accurately mapping user queries to business categories is a fundamental Information Retrieval challenge for multi-category marketplaces, where context-sparse queries such as "Wildflower" exhibit intent ambiguity, simultaneously denoting a restaurant chain, a retail product, and a floral item. Traditional classifiers force a winner-takes-all assignment, while general-purpose LLMs hallucinate unavailable inventory. We introduce an Agentic Multi-Source Grounded system that addresses both failure modes by grounding LLM inference in (i) a staged catalog entity retrieval pipeline and (ii) an agentic web-search tool invoked autonomously for cold-start queries. Rather than predicting a single label, the model emits an ordered multi-intent set, resolved by a configurable disambiguation layer that applies deterministic business policies and is designed for extensibility to personalization signals. This decoupled design generalizes across domains, allowing any marketplace to supply its own grounding sources and resolution rules without modifying the core architecture. Evaluated on DoorDash's multi-vertical search platform, the system achieves +10.9pp over the ungrounded LLM baseline and +4.6pp over the legacy production system. On long-tail queries, incremental ablations attribute +8.3pp to catalog grounding, +3.2pp to agentic web search grounding, and +1.5pp to dual intent disambiguation, yielding 90.7% accuracy (+13.0pp over baseline). The system is deployed in production, serving over 95% of daily search impressions, and establishes a generalizable paradigm for applications requiring foundation models grounded in proprietary context and real-time web knowledge to resolve ambiguous, context-sparse decision problems at scale.