Scholar

Zhisheng Zheng

Google Scholar ID: WYwBrzAAAAAJ

The University of Texas at Austin

Speech and Language ProcessingNatural Language ProcessingMultimodal Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

514

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzszheng@utexas.edu CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

8 items

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

IEEE Journal on Selected Topics in Signal Processing · 2026

Cited

RosettaSpeech: Zero-Shot Speech-to-Speech Translation from Monolingual Data

2025

Cited

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing

2025

Cited

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

2025

Cited

MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix

2025

Cited

Scaling Rich Style-Prompted Text-to-Speech Datasets

2025

Cited

DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning

arXiv.org · 2024

Cited

BAT: Learning to Reason about Spatial Sounds with Large Language Models

International Conference on Machine Learning · 2024

Cited

Resume (English only)

Academic Achievements

1. 2 papers accepted by EMNLP 2025; 2. BAT accepted by ICML 2024; 3. EAT: Self-Supervised Pre-Training with Efficient Audio Transformer accepted by IJCAI 2024; 4. Released emotion2vec, the first universal speech emotion model that excels across diverse emotional tasks, languages; 5. 1 paper accepted by ICASSP 2024; 6. Released Fast-HuBERT, accelerating HuBERT pre-training in 5.2X speedup without performance drop; 7. 2 papers accepted by IEEE ASRU 2023; 8. MT4SSL nominated in ISCA Interspeech Best Student Paper Shortlist; 9. 3 papers accepted by ISCA INTERSPEECH 2023; 10. 1 paper accepted by ICASSP 2023.

Research Experience

1. Research Intern at Microsoft Research Asia, mentored by Lei He and Xu Tan, focusing on Multilingual Text-to-Speech; 2. Research Intern at SALT Lab at UT-Austin during the summer of 2023, collaborating with Prof. David Harwath and Prof. Eunsol Choi; 3. Research Intern at X-Lance Lab at SJTU since 2021, supervised by Prof. Xie Chen.

Education

Ph.D. in Computer Science, 2024 - 2028 (expected), The University of Texas at Austin; BSc in Electrical Engineering & Zhiyuan Honors Program of Engineering, 2020 - 2024, Shanghai Jiao Tong University.

Background

Research Interests: Multimodal Large Language Model, Self-Supervised Learning, Speech and Audio Understanding. Background: He is a second-year Ph.D. student in Computer Science at the University of Texas at Austin.

Co-authors

5 total

Xie Chen

Shanghai Jiao Tong University <- Microsoft <- Cambridge University

Ziyang Ma

Shanghai Jiao Tong University

David Harwath

The University of Texas at Austin

Eunsol Choi

New York University

Puyuan Peng

Research Scientist, Meta Superintelligence Lab