TiCo: Time-Controllable Training for Spoken Dialogue Models

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current spoken dialogue models lack temporal awareness, making it difficult to follow duration-related instructions—such as “generate a response lasting approximately 15 seconds”—which degrades interactive user experience. To address this limitation, this work proposes TiCo, a method that endows the model with time awareness by incorporating Spoken Time Markers (STMs) during generation, enabling dynamic adjustment of outputs to meet target durations. TiCo requires only a small amount of real data combined with self-generated samples and does not rely on additional question-answer pairs; it leverages reinforcement learning for efficient training. Experimental results demonstrate that TiCo significantly improves adherence to duration constraints while preserving the naturalness and semantic quality of responses.

Technology Category

Application Category

📝 Abstract
We propose TiCo, a simple post-training method for enabling spoken dialogue models (SDMs) to follow time-constrained instructions and generate responses with controllable duration. This capability is valuable for real-world spoken language systems such as voice assistants and interactive agents, where controlling response duration can improve interaction quality. However, despite their strong ability to generate natural spoken responses, existing models lack time awareness and struggle to follow duration-related instructions (e.g., "Please generate a response lasting about 15 seconds"). Through an empirical evaluation of both open-source and commercial SDMs, we show that they frequently fail to satisfy such time-control requirements. TiCo addresses this limitation by enabling models to estimate elapsed speaking time during generation through Spoken Time Markers (STM) (e.g., <10.6 seconds>). These markers help the model maintain awareness of time and adjust the remaining content to meet the target duration. TiCo is simple and efficient: it requires only a small amount of data and no additional question-answer pairs, relying instead on self-generation and reinforcement learning. Experimental results show that TiCo significantly improves adherence to duration constraints while preserving response quality.
Problem

Research questions and friction points this paper is trying to address.

spoken dialogue models
time control
response duration
time-awareness
duration constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

time-controllable generation
spoken dialogue models
Spoken Time Markers
reinforcement learning
duration control
🔎 Similar Papers
No similar papers found.