🤖 AI Summary
This study investigates subjective speech quality evaluation of personalized own-voice reconstruction (OVR) systems in hearable devices, aiming to quantify their real-world gains over generic systems and expose limitations of conventional objective metrics. Method: We propose a personalized OVR framework integrating data augmentation and speaker-adaptive fine-tuning to enhance own-voice intelligibility and naturalness under noise, rigorously evaluated via controlled listening tests and multidimensional objective measures (PESQ, STOI, DNSMOS). Contribution/Results: Personalization yields statistically significant subjective quality improvements for only ~40% of users, revealing strong inter-individual variability. Common objective metrics exhibit weak correlation with subjective scores (|r| < 0.3) and systematically overestimate quality under specific interference conditions. This work provides the first empirical evidence of the high individual dependency of OVR personalization efficacy and the unreliability of mainstream evaluation paradigms, delivering critical methodological insights for future speech reconstruction system design and assessment.
📝 Abstract
Own voice pickup technology for hearable devices facilitates communication in noisy environments. Own voice reconstruction (OVR) systems enhance the quality and intelligibility of the recorded noisy own voice signals. Since disturbances affecting the recorded own voice signals depend on individual factors, personalized OVR systems have the potential to outperform generic OVR systems. In this paper, we propose personalizing OVR systems through data augmentation and fine-tuning, comparing them to their generic counterparts. We investigate the influence of personalization on speech quality assessed by objective metrics and conduct a subjective listening test to evaluate quality under various conditions. In addition, we assess the prediction accuracy of the objective metrics by comparing predicted quality with subjectively measured quality. Our findings suggest that personalized OVR provides benefits over generic OVR for some talkers only. Our results also indicate that performance comparisons between systems are not always accurately predicted by objective metrics. In particular, certain disturbances lead to a consistent overestimation of quality compared to actual subjective ratings.