Voice Conversion for Likability Control via Automated Rating of Speech Synthesis Corpora

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses the trade-off between likability controllability and speaker identity/linguistic content preservation in voice conversion. We propose an automatic scoring model–driven framework for likability control: first, a speech likability prediction model is trained to enable efficient, scalable preference annotation of large-scale synthetic speech corpora; second, an end-to-end voice conversion network is designed to explicitly regulate output likability while preserving speaker identity and semantic content. Comprehensive evaluation—combining subjective assessments (MOS, AB tests) and objective metrics (speaker similarity, ASR accuracy, phoneme error rate)—demonstrates strong correlation between predicted and human-rated likability (r > 0.85), continuous likability control capability, high speaker identity fidelity (>92%), and robust linguistic integrity (WER < 5.2%).

Technology Category

Application Category

📝 Abstract

Perceived voice likability plays a crucial role in various social interactions, such as partner selection and advertising. A system that provides reference likable voice samples tailored to target audiences would enable users to adjust their speaking style and voice quality, facilitating smoother communication. To this end, we propose a voice conversion method that controls the likability of input speech while preserving both speaker identity and linguistic content. To improve training data scalability, we train a likability predictor on an existing voice likability dataset and employ it to automatically annotate a large speech synthesis corpus with likability ratings. Experimental evaluations reveal a significant correlation between the predictor's outputs and human-provided likability ratings. Subjective and objective evaluations further demonstrate that the proposed approach effectively controls voice likability while preserving both speaker identity and linguistic content.

Problem

Research questions and friction points this paper is trying to address.

Control voice likability while preserving speaker identity

Automate likability rating for large speech synthesis corpus

Improve social interactions via tailored voice likability adjustment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Voice conversion method controls likability

Automated likability rating for speech corpus

Preserves speaker identity and linguistic content

🔎 Similar Papers

No similar papers found.