Yinghao Ma
Scholar

Yinghao Ma

Google Scholar ID: RiYt9toAAAAJ
PhD candidate, Centre for Digital Music (C4DM), Queen Mary University of London
Music Information RetrievalLarge Language ModelsMultimodal LearningAudio Signal Processing
Citations & Impact
All-time
Citations
818
 
H-index
15
 
i10-index
17
 
Publications
20
 
Co-authors
17
list available
Resume (English only)
Academic Achievements
  • Published several papers, including CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following, MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix, Yue: Scaling open foundation models for long-form music generation, Audio-flan: A preliminary release, etc.
Research Experience
  • He was one of the student conductors of the Chinese Philharmonic Orchestra at the Chinese Music Institute, Peking University. He participated in technical support for several concerts. During his PhD, he was involved in multiple research projects, including proposing a large-scale self-supervised training acoustic Music undERstanding model (MERT) and establishing a Music Audio Representation Benchmark for universal Evaluation (MRABLE).
Education
  • BSc in Mathematics, 2016-2020, School of Mathematical Science, Peking University; MSc in Music & Technology, 2020-2022, School of Music, College of Fine Arts, Carnegie Mellon University; PhD in AI Music, 2022-2026, School of EECS, C4DM, QMUL.
Background
  • Research Interests: Music Information Retrieval (MIR), Large Language Model (LLM), Music-related Multimodal Machine Learning, Audio Signal Processing. Biography: MA Yinghao is a PhD candidate in the AI & Music program at the Centre for Digital Music, Queen Mary University of London, supervised by Dr. Emmanouil Benetos, Dr. Chris Donahue (secondary), and Prof. Simon Dixon (independent assessor). He is one of the co-founders of the Multimodal Art Projection (MAP) community. Together with his colleague, he proposed an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), with more than 10k monthly downloads on the Hugging Face page, established a Music Audio Representation Benchmark for universal Evaluation (MRABLE), and developed music generation GPT models such as MuPT. Additionally, he is interested in music-related multimodality and developed MusiLingo, a music captioning and query response model based on the alignment of single-modality pre-trained models along with multimodal reasoning benchmarks including OmniBench and MMAR.
Miscellany
  • An advocate of charitable activities. He is going to be open to full-time positions in autumn 2026, focusing on foundation models for music-related multimodality.