Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Current spoken language models (SLMs) lack systematic evaluation of temporal dynamic capabilities—such as prosodic control, synchronized response generation, and full-duplex interaction—despite their critical role in natural spoken dialogue. Method: We introduce Game-Time, the first benchmark explicitly designed to assess SLMs’ temporal competence. Inspired by human language acquisition, Game-Time features multi-level instruction-following tasks with explicit time constraints, covering realistic scenarios including rhythm matching and synchronous speech response. Contribution/Results: Experiments reveal that while mainstream SLMs perform reasonably on basic tasks, their performance degrades significantly on temporally sensitive ones—exposing fundamental deficiencies in temporal perception and real-time coordination. Game-Time establishes the first structured, quantifiable, and reproducible evaluation framework for temporal dynamics in SLMs, providing both a rigorous assessment paradigm and concrete directions for model improvement.

Technology Category

Application Category

📝 Abstract

Conversational Spoken Language Models (SLMs) are emerging as a promising paradigm for real-time speech interaction. However, their capacity of temporal dynamics, including the ability to manage timing, tempo and simultaneous speaking, remains a critical and unevaluated challenge for conversational fluency. To address this gap, we introduce the Game-Time Benchmark, a framework to systematically assess these temporal capabilities. Inspired by how humans learn a language through language activities, Game-Time consists of basic instruction-following tasks and advanced tasks with temporal constraints, such as tempo adherence and synchronized responses. Our evaluation of diverse SLM architectures reveals a clear performance disparity: while state-of-the-art models handle basic tasks well, many contemporary systems still struggle with fundamental instruction-following. More critically, nearly all models degrade substantially under temporal constraints, exposing persistent weaknesses in time awareness and full-duplex interaction. The Game-Time Benchmark provides a foundation for guiding future research toward more temporally-aware conversational AI. Demos and datasets are available on our project website https://ga642381.github.io/Game-Time.

Problem

Research questions and friction points this paper is trying to address.

Evaluating SLMs' temporal dynamics for conversational fluency

Assessing timing, tempo, and simultaneous speaking capabilities

Addressing performance gaps under temporal constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Game-Time Benchmark evaluates SLM temporal dynamics

Framework includes basic and advanced temporal tasks

Assesses timing, tempo, and simultaneous speaking capabilities

🔎 Similar Papers

Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time