DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the challenges of error propagation and high maintenance costs in traditional modular voice response systems for large-scale point-of-interest (POI) attribute collection. To overcome these limitations, the authors propose an end-to-end dialogue system powered by a large language model. The approach leverages finite state machine (FSM)-guided data augmentation to mitigate long-tailed data distributions, integrates chain-of-thought (CoT) reasoning with selective generation to suppress hallucinations, and introduces a dual-evaluator collaborative iterative learning mechanism for continuous policy optimization with minimal human intervention. Deployed in production, the system handles approximately 400,000 calls per day, achieving a task success rate of 83.9%—a 4-percentage-point improvement over the previous system—with an average response latency of only 130 milliseconds.

📝 Abstract

Accurate Point of Interest (POI) attribute acquisition is essential for location-based services, yet traditional modular Interactive Voice Response (IVR) systems suffer from error accumulation and high maintenance overhead. We present DuIVRS-2, a large language model (LLM)-based end-to-end framework designed for large-scale POI attribute acquisition at Baidu Maps. To address the long-tail distribution of real-world interactions, our methodology first employs a finite state machine (FSM)-guided data augmentation strategy to synthesize a balanced and diverse training dataset. We then streamline dialogue management via a selective generation scheme combined with a Chain-of-Thought (CoT) mechanism, which ensures output stability and effectively eliminates hallucinations in industrial settings. To facilitate continuous policy refinement with minimal manual effort, we design a cooperative iterative learning framework that leverages a dual-evaluator voting system. Deployed in production for two months, DuIVRS-2 processed 0.4 million calls daily and achieved a 83.9\% Task Success Rate (TSR), outperforming its predecessor by 4 percentage points while maintaining a low reaction time of 130ms. This work provides a production-proven reference for developing robust, cost-effective LLM agents for large-scale industrial dialogue applications.

Problem

Research questions and friction points this paper is trying to address.

Interactive Voice Response

POI attribute acquisition

error accumulation

maintenance overhead

large-scale

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based IVR

FSM-guided data augmentation

Chain-of-Thought reasoning