Published several papers, including CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following, MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix, Yue: Scaling open foundation models for long-form music generation, Audio-flan: A preliminary release, etc.
Research Experience
He was one of the student conductors of the Chinese Philharmonic Orchestra at the Chinese Music Institute, Peking University. He participated in technical support for several concerts. During his PhD, he was involved in multiple research projects, including proposing a large-scale self-supervised training acoustic Music undERstanding model (MERT) and establishing a Music Audio Representation Benchmark for universal Evaluation (MRABLE).
Education
BSc in Mathematics, 2016-2020, School of Mathematical Science, Peking University; MSc in Music & Technology, 2020-2022, School of Music, College of Fine Arts, Carnegie Mellon University; PhD in AI Music, 2022-2026, School of EECS, C4DM, QMUL.
Background
Research Interests: Music Information Retrieval (MIR), Large Language Model (LLM), Music-related Multimodal Machine Learning, Audio Signal Processing. Biography: MA Yinghao is a PhD candidate in the AI & Music program at the Centre for Digital Music, Queen Mary University of London, supervised by Dr. Emmanouil Benetos, Dr. Chris Donahue (secondary), and Prof. Simon Dixon (independent assessor). He is one of the co-founders of the Multimodal Art Projection (MAP) community. Together with his colleague, he proposed an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), with more than 10k monthly downloads on the Hugging Face page, established a Music Audio Representation Benchmark for universal Evaluation (MRABLE), and developed music generation GPT models such as MuPT. Additionally, he is interested in music-related multimodality and developed MusiLingo, a music captioning and query response model based on the alignment of single-modality pre-trained models along with multimodal reasoning benchmarks including OmniBench and MMAR.
Miscellany
An advocate of charitable activities. He is going to be open to full-time positions in autumn 2026, focusing on foundation models for music-related multimodality.