The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the lack of systematic evaluation of human-like capabilities in spoken dialogue systems—particularly in emotional intelligence and full-duplex interaction—in the era of large language models. To this end, we propose the first comprehensive benchmarking framework for human-like spoken dialogue, integrating two core dimensions: long-term emotional understanding with empathetic response generation, and real-time, listen-while-speaking decision-making. Leveraging authentic human conversations, we construct a large-scale, high-quality dataset and develop a standardized end-to-end evaluation platform powered by Audio-LLMs and Omni-models. The framework was deployed in the inaugural HumDial Challenge at ICASSP 2026, where we released the benchmark dataset and track configurations, and conducted a systematic analysis of participating teams’ performance, thereby establishing a reliable foundation and clear direction for future research on human-like dialogue systems.

Technology Category

Application Category

📝 Abstract

Driven by the rapid advancement of Large Language Models (LLMs), particularly Audio-LLMs and Omni-models, spoken dialogue systems have evolved significantly, progressively narrowing the gap between human-machine and human-human interactions. Achieving truly ``human-like''communication necessitates a dual capability: emotional intelligence to perceive and resonate with users'emotional states, and robust interaction mechanisms to navigate the dynamic, natural flow of conversation, such as real-time turn-taking. Therefore, we launched the first Human-like Spoken Dialogue Systems Challenge (HumDial) at ICASSP 2026 to benchmark these dual capabilities. Anchored by a sizable dataset derived from authentic human conversations, this initiative establishes a fair evaluation platform across two tracks: (1) Emotional Intelligence, targeting long-term emotion understanding and empathetic generation; and (2) Full-Duplex Interaction, systematically evaluating real-time decision-making under `` listening-while-speaking''conditions. This paper summarizes the dataset, track configurations, and the final results.

Problem

Research questions and friction points this paper is trying to address.

Human-like Spoken Dialogue

Emotional Intelligence

Full-Duplex Interaction

LLM-based Dialogue Systems

Benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-like Spoken Dialogue

Emotional Intelligence

Full-Duplex Interaction

Audio-LLM

Benchmarking

🔎 Similar Papers

A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems

2024-02-28arXiv.orgCitations: 93