🤖 AI Summary
This work addresses the challenge of modeling and naturally expressing implicit chains of thought in full-duplex dialogue by proposing a Graph-of-Thought (GoT) architecture. The approach employs a multi-level perceptual framework to capture the causal and temporal dependencies from communicative intent to verbal behavior, integrating a hierarchical annotation scheme, graph-structured reasoning, and a Transformer backbone to enable streaming dynamic inference on high-quality, human-annotated, controllable dialogue corpora. The study introduces the first foundational model tailored for full-duplex dialogue behavior modeling, demonstrating robust behavior detection and interpretable chain-of-thought generation on both synthetic and real-world data, thereby establishing a new benchmark for dialogue reasoning.
📝 Abstract
Human conversation is organized by an implicit chain of thoughts that manifests as timed speech acts. Capturing this perceptual pathway is key to building natural full-duplex interactive systems. We introduce a framework that models this process as multi-level perception, and then reasons over conversational behaviors via a Graph-of-Thoughts (GoT). Our approach formalizes the intent-to-action pathway with a hierarchical labeling scheme, predicting high-level communicative intents and low-level speech acts to learn their causal and temporal dependencies. To train this system, we develop a high quality corpus that pairs controllable, event-rich dialogue data with human-annotated labels. The GoT framework structures streaming predictions as an evolving graph, enabling a transformer to forecast the next speech act, generate concise justifications for its decisions, and dynamically refine its reasoning. Experiments on both synthetic and real duplex dialogues show that the framework delivers robust behavior detection, produces interpretable reasoning chains, and establishes a foundation for benchmarking conversational reasoning in full duplex spoken dialogue systems.