Scholar

Rongjie Huang

Google Scholar ID: iRHBUsgAAAAJ

FAIR, Zhejiang University

Multimedia ComputingSpeechNatural Language Processing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

3,557

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailrongjiehuang@zju.edu.cn TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

16 items

HeartMuLa: A Family of Open Sourced Music Foundation Models

2026

Cited

PrismAudio: Decomposed Chain-of-Thoughts and Multi-dimensional Rewards for Video-to-Audio Generation

2025

Cited

Versatile Framework for Song Generation with Prompt-based Control

2025

Cited

Unleashing the Power of Natural Audio Featuring Multiple Sound Sources

2025

Cited

OmniAudio: Generating Spatial Audio from 360-Degree Video

2025

Cited

TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching

2025

Cited

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

2025

Cited

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Published first-author papers at top international AI conferences like NeurIPS/ICLR/ICML/ACL/IJCAI. Awarded the Best Thesis Award by the Electrical Engineering Association (2025.04). Released several notable algorithms, including UniAudio, AudioGPT, etc. Published multiple papers in important conferences.

Research Experience

Worked at the Seamless Team at FAIR. Developed several well-known Speech/NLP algorithms such as Seamless-Interaction (LLama4+Dyadic Motion Diffusion), AudioGPT, UniAudio, etc.

Education

Graduated from the College of Computer Science, Zhejiang University, supervised by Prof. Zhou Zhao. Also obtained a Bachelor’s degree from Zhejiang University.

Background

Research interests include Multi-modal Large Language Model, Video-Audio Generative Models, and Audio-Visual Language Processing. Previously worked at the Seamless Team at FAIR.

Miscellany