BoSS: Beyond-Semantic Speech

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current speech technologies (ASR/TTS) are largely confined to explicit semantic understanding, failing to model suprasemantic communicative signals—such as emotion, contextual dynamics, and implicit meaning—thereby limiting the naturalness and depth of human–machine interaction. To address this, we propose the “Beyond-Semantic Speech” (BoSS) paradigm and introduce a five-level spoken interaction capability framework (L1–L5) that systematically characterizes multidimensional implicit signals in speech. Integrating cognitive association theory with machine learning, we formalize the modeling of temporally dynamic, context-dependent non-semantic information. Empirical evaluation reveals that state-of-the-art spoken language models perform significantly below human baselines on BoSS tasks, exposing critical bottlenecks in situational awareness and interactional richness. This work establishes a novel paradigm, a principled framework, and a verifiable evaluation methodology for human-like speech intelligence.

Technology Category

Application Category

📝 Abstract
Human communication involves more than explicit semantics, with implicit signals and contextual cues playing a critical role in shaping meaning. However, modern speech technologies, such as Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) often fail to capture these beyond-semantic dimensions. To better characterize and benchmark the progression of speech intelligence, we introduce Spoken Interaction System Capability Levels (L1-L5), a hierarchical framework illustrated the evolution of spoken dialogue systems from basic command recognition to human-like social interaction. To support these advanced capabilities, we propose Beyond-Semantic Speech (BoSS), which refers to the set of information in speech communication that encompasses but transcends explicit semantics. It conveys emotions, contexts, and modifies or extends meanings through multidimensional features such as affective cues, contextual dynamics, and implicit semantics, thereby enhancing the understanding of communicative intentions and scenarios. We present a formalized framework for BoSS, leveraging cognitive relevance theories and machine learning models to analyze temporal and contextual speech dynamics. We evaluate BoSS-related attributes across five different dimensions, reveals that current spoken language models (SLMs) are hard to fully interpret beyond-semantic signals. These findings highlight the need for advancing BoSS research to enable richer, more context-aware human-machine communication.
Problem

Research questions and friction points this paper is trying to address.

Capturing implicit signals in human communication
Advancing speech intelligence beyond explicit semantics
Enhancing context-aware human-machine interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical framework for spoken interaction levels
Beyond-Semantic Speech (BoSS) for implicit signals
Cognitive relevance and ML for speech dynamics
🔎 Similar Papers
No similar papers found.
Q
Qing Wang
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
Zehan Li
Zehan Li
PhD, UTHealth Houston
AI for Mental HealthPsychiatryBiomedical InformaticsLLMsClinical Phenotyping
H
Hang Lv
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
H
Hongjie Chen
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
Y
Yaodong Song
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
J
Jian Kang
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
J
Jie Lian
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
J
Jie Li
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
Yongxiang Li
Yongxiang Li
Professor, RMIT University
Electronic Materials and Devices
Z
Zhongjiang He
Institute of Artificial Intelligence (TeleAI), China Telecom, China.
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom, China.