An Efficient Task-Oriented Dialogue Policy: Evolutionary Reinforcement Learning Injected by Elite Individuals

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep reinforcement learning (DRL) for task-oriented dialogue policy optimization suffers from exploration-exploitation imbalance, susceptibility to local optima, and unstable convergence in high-dimensional state-action spaces. To address these challenges, this paper proposes a novel evolutionary algorithm (EA)-DRL hybrid framework. Its core contributions are: (1) an elite individual injection mechanism that adaptively selects and transfers high-performing policy parameters from the DRL agent into the EA population; and (2) a population diversity preservation strategy to enhance global search efficiency. Evaluated on four standard benchmarks—MultiWOZ, CamRest, Kvret, and SGD—the method achieves significant improvements: average task success rate increases by 4.2%, convergence stability improves with 37% reduction in variance, and training time decreases by 28%. To our knowledge, this is the first work to realize efficient and robust synergy between EA and DRL for natural language dialogue policy optimization.

Technology Category

Application Category

📝 Abstract
Deep Reinforcement Learning (DRL) is widely used in task-oriented dialogue systems to optimize dialogue policy, but it struggles to balance exploration and exploitation due to the high dimensionality of state and action spaces. This challenge often results in local optima or poor convergence. Evolutionary Algorithms (EAs) have been proven to effectively explore the solution space of neural networks by maintaining population diversity. Inspired by this, we innovatively combine the global search capabilities of EA with the local optimization of DRL to achieve a balance between exploration and exploitation. Nevertheless, the inherent flexibility of natural language in dialogue tasks complicates this direct integration, leading to prolonged evolutionary times. Thus, we further propose an elite individual injection mechanism to enhance EA's search efficiency by adaptively introducing best-performing individuals into the population. Experiments across four datasets show that our approach significantly improves the balance between exploration and exploitation, boosting performance. Moreover, the effectiveness of the EII mechanism in reducing exploration time has been demonstrated, achieving an efficient integration of EA and DRL on task-oriented dialogue policy tasks.
Problem

Research questions and friction points this paper is trying to address.

Balancing exploration and exploitation in dialogue policy optimization
Overcoming local optima and poor convergence in DRL
Reducing evolutionary time via elite injection for EA-DRL integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Evolutionary Algorithms with Deep Reinforcement Learning
Uses elite individual injection to enhance search efficiency
Balances exploration and exploitation in dialogue policy
🔎 Similar Papers
No similar papers found.
Yangyang Zhao
Yangyang Zhao
South China University of Technology
Natural language processingComputer-Human InteractionReinforcement learningDialogue Policy
B
Ben Niu
School of Computer Science and Technology, Changsha University of Science and Technology, China
L
Libo Qin
School of Computer Science and Engineering, Central South University, China
Shihan Wang
Shihan Wang
Utrecht University
Machine LearningReinforcement LearningSocial Network Analysis