Scholar

Xuenan Xu

Google Scholar ID: e0h0ae8AAAAJ

Shanghai Jiao Tong University

audio generationaudio understandingspeech synthesis

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

908

H-index

i10-index

Publications

Co-authors

Contact

Emailwsntxxn@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

20 items

Foley-Omni: A Unified Multimodal Generation Model from Task-Level Audio Synthesis to Complete Video Soundtrack Generation

2026

Cited

AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training

2026

Cited

AuDirector: A Self-Reflective Closed-Loop Framework for Immersive Audio Storytelling

2026

Cited

FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pretraining

2026

Cited

STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain Distillation

2026

Cited

CAST-TTS: A Simple Cross-Attention Framework for Unified Timbre Control in TTS

2026

Cited

SemanticVocoder: Bridging Audio Generation and Audio Understanding via Semantic Latents

2026

Cited

HoliAntiSpoof: Audio LLM for Holistic Speech Anti-Spoofing

2026

Cited

Resume (English only)

Academic Achievements

Published multiple papers in the field of audio understanding and generation. Specific projects include: enhanced accuracy, diversity, temporal accuracy, and efficiency in audio captioning; task and weakly-supervised training paradigm for text to audio grounding; BLAT, Auto-ACD, detailed audio-text simulation; visually-enhanced diverse generation; PicoAudio with a temporal-sensitive evaluation benchmark; Audio Codec for Audio LLM (SemantiCodec); content creation with LLM agent, e.g., AI storytelling for children.

Research Experience

Mainly focuses on general audio understanding and generation, including tasks such as audio captioning, text to audio grounding, audio-text retrieval, and text to audio generation. Also interested in speech/music understanding and generation and their interaction with general audio.

Education

2019.9 - 2025.6, Ph.D., Shanghai Jiao Tong University, supervised by Prof. Mengyue Wu and Prof. Kai Yu; 2023.10 - 2024.4, visiting Ph.D., University of Surrey, supervised by Prof. Mark D. Plumbley and Prof. Wenwu Wang; 2015.9 - 2019.6, Bachelor, Shanghai Jiao Tong University, supervised by Leyun Wang.

Background

A fourth year Ph.D. candidate from X-LANCE Lab, Shanghai Jiao Tong University, supervised by Prof. Mengyue Wu and Prof. Kai Yu. Research interests include audio/speech/music understanding and generation, and large language models.

Miscellany

Expected to graduate in June 2025 and open to job opportunities in 2025. Can be contacted via LinkedIn or WeChat.

Co-authors

0 total

Co-authors: 0 (list not available)