Scholar

Yuzhe Liang

Google Scholar ID: deUSxiYAAAAJ

Shanghai Jiao Tong University

Deep learningMultimodal Learning

Google Scholar↗

Citations & Impact

All-time

Citations

191

H-index

7

i10-index

5

Publications

14

Co-authors

4

list available

Contact

No contact links provided.

Publications

14 items

Towards Unified Song Generation and Singing Voice Conversion with Accompaniment Co-Generation

2026

Cited

0

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

2026

Cited

0

UniSonate: A Unified Model for Speech, Music, and Sound Effect Generation with Text Instructions

2026

Cited

0

V2A-DPO: Omni-Preference Optimization for Video-to-Audio Generation

2026

Cited

0

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

2026

Cited

0

MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning

arXiv.org · 2026

Cited

1

DMP-TTS: Disentangled multi-modal Prompting for Controllable Text-to-Speech with Chained Guidance

2025

Cited

0

M3-TTS: Multi-modal DiT Alignment & Mel-latent for Zero-shot High-fidelity Speech Synthesis

2025

Cited

0

Resume (English only)

Co-authors

4 total

Shanghai Jiao Tong University <- Microsoft <- Cambridge University

Shanghai Jiao Tong University

The University of Texas at Austin

Shanghai Jiao Tong University