🤖 AI Summary
This study investigates how stereo processing—specifically Mid/Side versus Left/Right encoding—affects subjective audio quality perception, and evaluates the predictive accuracy of mainstream objective metrics (e.g., PEAQ, ITU-R BS.1387, DNSMOS) under spatial distortions. Leveraging the ODAQ stereo extension dataset with corresponding Mean Opinion Scores (MOS), we conduct time–frequency domain metric comparisons and statistical modeling. Our analysis quantitatively reveals, for the first time, the critical interplay between bottom-up auditory mechanisms and top-down contextual factors in stereo quality prediction. Results show that timbre-oriented metrics remain robust under simple distortions but degrade significantly under spatial distortions; current models exhibit systematic bias due to their neglect of spatial dimensions. We propose a novel three-dimensional perceptual evaluation paradigm integrating temporal, spectral, and spatial cues—providing both theoretical foundation and methodological support for next-generation audio quality metrics.
📝 Abstract
ODAQ (Open Dataset of Audio Quality) provides a comprehensive framework for exploring both monaural and binaural audio quality degradations across a range of distortion classes and signals, accompanied by subjective quality ratings. A recent update of ODAQ, focusing on the impact of stereo processing methods such as Mid/Side (MS) and Left/Right (LR), provides test signals and subjective ratings for the in-depth investigation of state-of-the-art objective audio quality metrics. Our evaluation results suggest that, while timbre-focused metrics often yield robust results under simpler conditions, their prediction performance tends to suffer under the conditions with a more complex presentation context. Our findings underscore the importance of modeling the interplay of bottom-up psychoacoustic processes and top-down contextual factors, guiding future research toward models that more effectively integrate both timbral and spatial dimensions of perceived audio quality.