Scholar

Yinghao Ma

Google Scholar ID: RiYt9toAAAAJ

PhD candidate, Centre for Digital Music (C4DM), Queen Mary University of London

Music Information RetrievalLarge Language ModelsMultimodal LearningAudio Signal Processing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

818

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailyinghao.ma@qmul.ac.uk CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

12 items

Audio-Visual Intelligence in Large Foundation Models

2026

Cited

CMI-RewardBench: Evaluating Music Reward Models with Compositional Multimodal Instruction

2026

Cited

The Interspeech 2026 Audio Reasoning Challenge: Evaluating Reasoning Process Quality for Audio Reasoning Models and Agents

2026

Cited

AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs'Contextual and Cultural Knowledge and Thinking

2026

Cited

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

IEEE Journal on Selected Topics in Signal Processing · 2026

Cited

AutoMV: An Automatic Multi-Agent System for Music Video Generation

2025

Cited

Seeing the Forest and the Trees: Query-Aware Tokenizer for Long-Video Multimodal Language Models

2025

Cited

CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following

2025

Cited

Resume (English only)

Academic Achievements

Published several papers, including CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following, MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix, Yue: Scaling open foundation models for long-form music generation, Audio-flan: A preliminary release, etc.

Research Experience

He was one of the student conductors of the Chinese Philharmonic Orchestra at the Chinese Music Institute, Peking University. He participated in technical support for several concerts. During his PhD, he was involved in multiple research projects, including proposing a large-scale self-supervised training acoustic Music undERstanding model (MERT) and establishing a Music Audio Representation Benchmark for universal Evaluation (MRABLE).

Education

BSc in Mathematics, 2016-2020, School of Mathematical Science, Peking University; MSc in Music & Technology, 2020-2022, School of Music, College of Fine Arts, Carnegie Mellon University; PhD in AI Music, 2022-2026, School of EECS, C4DM, QMUL.

Background

Research Interests: Music Information Retrieval (MIR), Large Language Model (LLM), Music-related Multimodal Machine Learning, Audio Signal Processing. Biography: MA Yinghao is a PhD candidate in the AI & Music program at the Centre for Digital Music, Queen Mary University of London, supervised by Dr. Emmanouil Benetos, Dr. Chris Donahue (secondary), and Prof. Simon Dixon (independent assessor). He is one of the co-founders of the Multimodal Art Projection (MAP) community. Together with his colleague, he proposed an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), with more than 10k monthly downloads on the Hugging Face page, established a Music Audio Representation Benchmark for universal Evaluation (MRABLE), and developed music generation GPT models such as MuPT. Additionally, he is interested in music-related multimodality and developed MusiLingo, a music captioning and query response model based on the alignment of single-modality pre-trained models along with multimodal reasoning benchmarks including OmniBench and MMAR.

Miscellany

An advocate of charitable activities. He is going to be open to full-time positions in autumn 2026, focusing on foundation models for music-related multimodality.

Co-authors

17 total