GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

185K/year

📝 Abstract

Dialogue State Tracking (DST) requires precise extraction of structured information from multi-domain conversations, a task where Large Language Models (LLMs) struggle despite their impressive general capabilities. We present GEM (Graph-Enhanced Mixture-of-Experts), a novel framework that combines language models and graph-structured dialogue understanding with ReAct agent-based reasoning for superior DST performance. Our approach dynamically routes between specialized experts: a Graph Neural Network that captures dialogue structure and turn-level dependencies, and a finetuned T5-Small encoder-decoder for sequence modeling, coordinated by an intelligent router. For complex value generation tasks, we integrate ReAct agents that perform structured reasoning over dialogue context. On MultiWOZ 2.2, GEM achieves 65.19% Joint Goal Accuracy, substantially outperforming end-to-end LLM approaches (best: 38.43%) and surpassing state-of-the-art (SOTA) methods including TOATOD (63.79%), D3ST (58.70%), and Diable (56.48%). Our graph-enhanced mixture-of-experts architecture with ReAct integration demonstrates that combining structured dialogue representation with dynamic expert routing and agent-based reasoning provides a powerful paradigm for dialogue state tracking, achieving superior accuracy while maintaining computational efficiency through selective expert activation.

Problem

Research questions and friction points this paper is trying to address.

Dialogue State Tracking

Large Language Models

Structured Information Extraction

Multi-domain Conversations

Graph-structured Dialogue Understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Neural Network

Mixture-of-Experts

ReAct Agents