Applying General Turn-taking Models to Conversational Human-Robot Interaction

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In human–robot dialogue, unnatural interactions arise from poorly timed responses, abrupt pauses, and frequent interruptions. Method: This paper introduces the first zero-shot transfer of general-purpose, fine-tuning-free self-supervised turn-taking models—TurnGPT and Voice Activity Projection (VAP)—to human–robot interaction (HRI). We propose a multi-model collaborative framework for real-time turn-taking decisions, integrating VAP for end-of-turn prediction, TurnGPT for dialogue rhythm modeling, and a large language model (LLM) with the Furhat robot platform to jointly govern response preparation, proactive turn-taking, and interruption recovery—without domain-specific fine-tuning. Contribution/Results: In a controlled study with 39 participants, our approach significantly reduced system response latency, decreased user-initiated interruptions by 42.3%, and substantially improved subjective ratings of naturalness and satisfaction, demonstrating the effectiveness and practicality of zero-shot transfer of general dialogue representations to HRI.

Technology Category

Application Category

📝 Abstract
Turn-taking is a fundamental aspect of conversation, but current Human-Robot Interaction (HRI) systems often rely on simplistic, silence-based models, leading to unnatural pauses and interruptions. This paper investigates, for the first time, the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI. These models are trained on human-human dialogue data using self-supervised learning objectives, without requiring domain-specific fine-tuning. We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions. We evaluated the proposed system in a within-subject study against a traditional baseline system, using the Furhat robot with 39 adults in a conversational setting, in combination with a large language model for autonomous response generation. The results show that participants significantly prefer the proposed system, and it significantly reduces response delays and interruptions.
Problem

Research questions and friction points this paper is trying to address.

Human-Robot Interaction
Dialogue System
User Satisfaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

TurnGPT
VAP
Dialogue Management
🔎 Similar Papers
No similar papers found.