π€ AI Summary
This work addresses the challenges of error propagation and high maintenance costs in traditional modular voice response systems for large-scale point-of-interest (POI) attribute collection. To overcome these limitations, the authors propose an end-to-end dialogue system powered by a large language model. The approach leverages finite state machine (FSM)-guided data augmentation to mitigate long-tailed data distributions, integrates chain-of-thought (CoT) reasoning with selective generation to suppress hallucinations, and introduces a dual-evaluator collaborative iterative learning mechanism for continuous policy optimization with minimal human intervention. Deployed in production, the system handles approximately 400,000 calls per day, achieving a task success rate of 83.9%βa 4-percentage-point improvement over the previous systemβwith an average response latency of only 130 milliseconds.
π Abstract
Accurate Point of Interest (POI) attribute acquisition is essential for location-based services, yet traditional modular Interactive Voice Response (IVR) systems suffer from error accumulation and high maintenance overhead. We present DuIVRS-2, a large language model (LLM)-based end-to-end framework designed for large-scale POI attribute acquisition at Baidu Maps. To address the long-tail distribution of real-world interactions, our methodology first employs a finite state machine (FSM)-guided data augmentation strategy to synthesize a balanced and diverse training dataset. We then streamline dialogue management via a selective generation scheme combined with a Chain-of-Thought (CoT) mechanism, which ensures output stability and effectively eliminates hallucinations in industrial settings. To facilitate continuous policy refinement with minimal manual effort, we design a cooperative iterative learning framework that leverages a dual-evaluator voting system. Deployed in production for two months, DuIVRS-2 processed 0.4 million calls daily and achieved a 83.9\% Task Success Rate (TSR), outperforming its predecessor by 4 percentage points while maintaining a low reaction time of 130ms. This work provides a production-proven reference for developing robust, cost-effective LLM agents for large-scale industrial dialogue applications.