🤖 AI Summary
Current pitch curve generators suffer from two key limitations: (1) difficulty in modeling singer-specific expressivity, and (2) task-specific design—e.g., for pitch correction or singing voice synthesis—resulting in poor generalizability. This paper introduces the first cross-task transferable framework for universal pitch curve generation. It implicitly learns vocal style from reference audio while preserving melodic alignment, enabling high-fidelity stylistic modeling. Our method builds upon a modified flow-matching architecture, conditioned jointly on symbolic musical scores and pitch-context features, to generate stylistic pitch curves end-to-end. Experiments demonstrate that our model significantly outperforms baselines in style similarity and audio naturalness, with marked improvements in subjective evaluation, while maintaining pitch accuracy comparable to task-specialized models.
📝 Abstract
Existing pitch curve generators face two main challenges: they often neglect singer-specific expressiveness, reducing their ability to capture individual singing styles. And they are typically developed as auxiliary modules for specific tasks such as pitch correction, singing voice synthesis, or voice conversion, which restricts their generalization capability. We propose StylePitcher, a general-purpose pitch curve generator that learns singer style from reference audio while preserving alignment with the intended melody. Built upon a rectified flow matching architecture, StylePitcher flexibly incorporates symbolic music scores and pitch context as conditions for generation, and can seamlessly adapt to diverse singing tasks without retraining. Objective and subjective evaluations across various singing tasks demonstrate that StylePitcher improves style similarity and audio quality while maintaining pitch accuracy comparable to task-specific baselines.