Scholar

Alexander H. Liu

Google Scholar ID: LIiCDa0AAAAJ

Massachusetts Institute of Technology

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,354

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailalexhliu@mit.edu TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

13 items

Overcoming State Inertia in Full-Duplex Spoken Language Models via Activation Steering

2026

Cited

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

2026

Cited

Voxtral TTS

2026

Cited

Voxtral Realtime

2026

Cited

Ministral 3

2026

Cited

Schrodinger Audio-Visual Editor: Object-Level Audiovisual Removal

2025

Cited

Voxtral

2025

Cited

USAD: Universal Speech and Audio Representation via Distillation

2025

Cited

Resume (English only)

Academic Achievements

1. SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction, ACL 2025 (Oral)
2. UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation, ICLR 2025
3. Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration, ICASSP 2025
4. Generative Pre-training for Speech with Flow Matching, ICLR 2024
5. Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective, ICASSP 2024
6. DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning, NeurIPS 2023
7. Joint Audio and Speech Understanding, ASRU 2023
8. Listen, Think, and Understand, ICLR 2024

Research Experience

1. Member of the Spoken Language System (SLS) Group at MIT, working on natural language and speech processing
2. Member of the Speech Processing Lab at NTU, working on machine learning and speech processing
3. Research intern at Facebook AI Research (now FAIR at Meta AI) and Nvidia Applied Deep Learning Research (ADLR)

Education

1. Ph.D. candidate at Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL), Advisor: Dr. James Glass
2. M.S. in Computer Science & Information Engineering (CSIE) from National Taiwan University (NTU), Advisors: Lin-shan Lee, Prof. Hung-yi Lee
3. B.S. in Computer Science & Information Engineering (CSIE) from National Taiwan University (NTU), Advisor: Yu-Chiang Frank Wang

Background

Research interests include natural language and speech processing, with the goal of building machines that can seamlessly interact with humans through voice. Specific areas include multimodal audio representation learning, multimodal alignment, large language models, and generative models for audio.

Miscellany

1. Currently on leave until Spring 2026, working at Mistral AI to build frontier open audio models such as Voxtral
2. Inspired by Wei-Chiu Ma, committed to providing 1-2 hours per week for suggestions and/or mentorships to junior students in need, especially those from underrepresented groups

Co-authors

11 total