Vo-Ve: An Explainable Voice-Vector for Speaker Identity Evaluation

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the lack of interpretability in speech representations by proposing Vo-Ve—a speaker-identity-oriented, interpretable speech vector embedding. Methodologically, Vo-Ve explicitly models speaker embeddings as probabilistic distributions over acoustic attributes (e.g., pitch, formants, articulation clarity), departing from conventional black-box feature vectors. It employs a deep neural network to jointly model acoustic features and attribute semantics, enabling end-to-end learning of embeddings that are both discriminative and interpretable. Experiments demonstrate that Vo-Ve achieves performance on par with state-of-the-art speaker embeddings (e.g., x-vector, ECAPA-TDNN) on speaker similarity evaluation, while enabling fine-grained, attribute-level interpretation—such as quantifying that “speaker similarity arises primarily from comparable fundamental frequency and nasality distributions.” This work establishes a novel interpretability paradigm for trustworthy speech recognition and human–machine interaction.

Technology Category

Application Category

📝 Abstract

In this paper, we propose Vo-Ve, a novel voice-vector embedding that captures speaker identity. Unlike conventional speaker embeddings, Vo-Ve is explainable, as it contains the probabilities of explicit voice attribute classes. Through extensive analysis, we demonstrate that Vo-Ve not only evaluates speaker similarity competitively with conventional techniques but also provides an interpretable explanation in terms of voice attributes. We strongly believe that Vo-Ve can enhance evaluation schemes across various speech tasks due to its high-level explainability.

Problem

Research questions and friction points this paper is trying to address.

Develops explainable voice-vector for speaker identity

Evaluates speaker similarity with interpretable voice attributes

Enhances speech tasks via explainable embedding techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explainable voice-vector embedding

Probabilities of voice attributes

Competitive speaker similarity evaluation

🔎 Similar Papers

People are poorly equipped to detect AI-powered voice clones