DyBBT: Dynamic Balance via Bandit inspired Targeting for Dialog Policy with Cognitive Dual-Systems

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In task-oriented dialogue systems, static exploration strategies fail to adapt to dynamic contextual shifts, resulting in inefficient exploration and policy fragility. To address this, we propose a dynamic balancing framework inspired by the cognitive dual-system theory (intuitive “fast” vs. deliberative “slow” decision-making). Our method introduces a structured cognitive state space and a multi-armed bandit–inspired meta-controller that dynamically orchestrates fast, intuition-driven responses and slow, reasoning-intensive actions. We further integrate user uncertainty modeling with slot dependency awareness, and enable adaptive exploration via visit counting and context-aware state representations. Evaluated on both single- and multi-domain benchmarks, our approach significantly improves task success rate and interaction efficiency while demonstrating strong generalization. Human evaluation confirms high alignment between system decisions and expert judgments.

Technology Category

Application Category

📝 Abstract
Task oriented dialog systems often rely on static exploration strategies that do not adapt to dynamic dialog contexts, leading to inefficient exploration and suboptimal performance. We propose DyBBT, a novel dialog policy learning framework that formalizes the exploration challenge through a structured cognitive state space capturing dialog progression, user uncertainty, and slot dependency. DyBBT proposes a bandit inspired meta-controller that dynamically switches between a fast intuitive inference (System 1) and a slow deliberative reasoner (System 2) based on real-time cognitive states and visitation counts. Extensive experiments on single- and multi-domain benchmarks show that DyBBT achieves state-of-the-art performance in success rate, efficiency, and generalization, with human evaluations confirming its decisions are well aligned with expert judgment. Code is available at https://github.com/carsonz/DyBBT.
Problem

Research questions and friction points this paper is trying to address.

Static exploration strategies fail to adapt to dynamic dialog contexts
Inefficient exploration leads to suboptimal dialog system performance
Need to balance fast intuitive inference with slow deliberative reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic bandit-inspired meta-controller for dialog policy
Switches between fast intuitive and slow deliberative reasoning
Uses cognitive states and visitation counts for targeting
🔎 Similar Papers
No similar papers found.