Tap-to-Adapt: Learning User-Aligned Response Timing for Speech Agents

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of dynamically aligning user intent to determine appropriate response timing in spoken interactive systems. The authors propose the Tap-to-Adapt framework, which introduces user-initiated light taps as real-time feedback signals to generate response timing labels online and continuously refine the model. By integrating a dilated temporal convolutional network (Dilated TCN) with a sequence replay strategy, the framework enables end-to-end modeling and evaluation of response timing. Evaluated on approximately 20,000 interaction samples collected from 20 participants, the approach demonstrates significant improvements in both response accuracy and user experience.

Technology Category

Application Category

📝 Abstract
Response timing judgment is a critical component of interactive speech agents. Although there exists substantial prior work on turn modeling and voice wake-up, there is a lack of research on response timing judgments continuously aligned with user intent. To address this, we propose the Tap-to-Adapt framework, which enables users to naturally activate or interrupt the agent via tap interactions to construct online learning labels for response timing models. Under this framework, Dilated TCN and a sequential replay strategy play significant roles, as demonstrated through data-driven experiments and user studies. Additionally, we develop an evaluation and continuous data mining system tailored for the Tap-to-Adapt framework, through which we have collected approximately 20,000 samples from the user studies involving 20 participants.
Problem

Research questions and friction points this paper is trying to address.

response timing
speech agents
user intent alignment
interactive systems
turn-taking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tap-to-Adapt
response timing
Dilated TCN
online learning
user-aligned interaction
🔎 Similar Papers
No similar papers found.
Z
Zihong He
The Hong Kong University of Science and Technology (Guangzhou)
Hai-Ning Liang
Hai-Ning Liang
The Hong Kong University of Science and Technology (Guangzhou)
VR/AR/MRGamesHCI
C
Chen Liang
The Hong Kong University of Science and Technology (Guangzhou)