Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Anomaly detection in real-world customer service dialogues faces significant challenges, including high business-domain complexity, strong dynamism in customer interactions, and stringent requirements for out-of-distribution (OOD) generalization. To address these, we propose a dual-loop dynamic curriculum learning framework that synergistically integrates large language model (LLM)-based reasoning with reinforcement learning. A key innovation is the perplexity-aware sampling mechanism, which enables progressive selection and scheduling of increasingly difficult training instances. This approach substantially enhances model adaptability and robustness to unseen business scenarios. Empirical evaluation on food delivery dialogue tasks demonstrates an average F1-score improvement of 17.19% and a 9.59% gain in OOD transfer performance, validating both industrial deployability and cross-domain generalization efficacy.

Technology Category

Application Category

📝 Abstract
Detecting abnormal events in real-world customer service dialogues is highly challenging due to the complexity of business data and the dynamic nature of customer interactions. Moreover, models must demonstrate strong out-of-domain (OOD) generalization to enable rapid adaptation across different business scenarios and maximize commercial value. In this work, we propose a novel Adaptive Perplexity-Aware Reinforcement Learning (APARL) framework that leverages the advanced reasoning capabilities of large language models for abnormal event detection. APARL introduces a dual-loop dynamic curriculum learning architecture, enabling the model to progressively focus on more challenging samples as its proficiency increases. This design effectively addresses performance bottlenecks and significantly enhances OOD transferability. Extensive evaluations on food delivery dialogue tasks show that our model achieves significantly enhanced adaptability and robustness, attaining the highest F1 score with an average improvement of 17.19%, and an average improvement of 9.59% in OOD transfer tests. This method provides a superior solution for industrial deployment of anomaly detection models, contributing to improved operational efficiency and commercial benefits.
Problem

Research questions and friction points this paper is trying to address.

Detecting abnormal events in complex customer service dialogues
Enhancing out-of-domain generalization for diverse business scenarios
Improving adaptability and robustness in anomaly detection models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Perplexity-Aware Reinforcement Learning framework
Dual-loop dynamic curriculum learning architecture
Enhanced OOD generalization via progressive sampling
🔎 Similar Papers
No similar papers found.
X
Xiaoyun Zhang
Meituan
J
Jingqing Ruan
Meituan
Xing Ma
Xing Ma
Meituan, NLP engineer
Dialog SystemLarge Language ModelConversation Analysis
Y
Yawen Zhu
Meituan
J
Jiansong Chen
Meituan
K
Ke Zeng
Meituan
X
Xunliang Cai
Meituan