Bottleneck Transformer-Based Approach for Improved Automatic STOI Score Prediction

📅 2026-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a non-intrusive speech intelligibility prediction method based on a bottleneck Transformer architecture, addressing the limitation of traditional Short-Time Objective Intelligibility (STOI) metrics that rely on clean reference speech and thus struggle in real-world, reference-free scenarios. The proposed model integrates convolutional modules to extract frame-level acoustic features and employs multi-head self-attention to emphasize critical time-frequency information. To enhance representation capability, it further fuses self-supervised learning with spectral features. Experimental results demonstrate that the method significantly outperforms existing approaches in both seen and unseen test conditions, achieving higher prediction correlation and lower mean squared error.

Technology Category

Application Category

📝 Abstract
In this study, we have presented a novel approach to predict the Short-Time Objective Intelligibility (STOI) metric using a bottleneck transformer architecture. Traditional methods for calculating STOI typically requires clean reference speech, which limits their applicability in the real world. To address this, numerous deep learning-based nonintrusive speech assessment models have garnered significant interest. Many studies have achieved commendable performance, but there is room for further improvement. We propose the use of bottleneck transformer, incorporating convolution blocks for learning frame-level features and a multi-head self-attention (MHSA) layer to aggregate the information. These components enable the transformer to focus on the key aspects of the input data. Our model has shown higher correlation and lower mean squared error for both seen and unseen scenarios compared to the state-of-the-art model using self-supervised learning (SSL) and spectral features as inputs.
Problem

Research questions and friction points this paper is trying to address.

STOI
nonintrusive speech assessment
speech intelligibility prediction
reference-free evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

bottleneck transformer
non-intrusive STOI prediction
multi-head self-attention
frame-level feature extraction
speech intelligibility assessment
🔎 Similar Papers
No similar papers found.
A
Amartyaveer
Spire Lab, Dept of Electrical Engg, Indian Institute of Science (IISc), Bengaluru, India
M
Murali Kadambi
Spire Lab, Dept of Electrical Engg, Indian Institute of Science (IISc), Bengaluru, India
C
Chandra Mohan Sharma
Center for Artificial Intelligence and Robotics (CAIR), DRDO, India
A
Anupam Mondal
Center for Artificial Intelligence and Robotics (CAIR), DRDO, India
Prasanta Kumar Ghosh
Prasanta Kumar Ghosh
Associate Professor, Indian Institute of Science (IISc), Bangalore
Human-centered signal and information processing