Scholar

Wei-Ning Hsu

Google Scholar ID: N5HDmqoAAAAJ

Facebook AI Research (FAIR)

Speech ProcessingSpeech SynthesisAudio GenerationMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

13,776

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailwnhsu@fb.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

9 items

SAM Audio Judge: A Unified Multimodal Framework for Perceptual Evaluation of Audio Separation

2026

Cited

Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning

2025

Cited

SAM Audio: Segment Anything in Audio

2025

Cited

MR-FlowDPO: Multi-Reward Direct Preference Optimization for Flow-Matching Text-to-Music Generation

2025

Cited

The AudioMOS Challenge 2025

2025

Cited

FlowDec: A flow-based full-band general audio codec with high perceptual quality

2025

Cited

Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound

2025

Cited

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Audio-Visual HuBERT: the first self-supervised model for audio-visual speech, achieving state-of-the-art performance on lip-reading, speech recognition, and audio-visual speech recognition with much less labeled data; data2vec: The first high-performance self-supervised algorithm that works for speech, vision, and text; Textless Speech-to-Speech Translation on Real Data: first ever text-free speech-to-speech translation model trained on real data that is on par with text-based models; wav2vec-U: an unsupervised speech recognition framework that rivals the best supervised model from 2 years ago and works for 10 languages; Textless NLP: a model that can do prompted or unprompted speech generation without using any text (like audio-version of GPT-2); HuBERT: a state-of-the-art self-supervised speech representation learning model for recognition, generation, and compression

Research Experience

Research Scientist at Facebook AI Research (FAIR)

Education

B.S. in Electrical Engineering from National Taiwan University in 2014, supervised by Prof. Lin-shan Lee and Prof. Hsuan-Tien Lin; S.M. and Ph.D. in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2018 and 2020, respectively, under the supervision of Dr. James Glass

Background

Research focuses on representation learning, self-supervised learning, and structured generative modeling for unimodal and multimodal speech. Passionate about reducing the supervision required for various speech applications and developing technologies applicable to both written and unwritten languages.

Miscellany