Yuchen Hu
Scholar

Yuchen Hu

Google Scholar ID: Neo-1mIAAAAJ
Nanyang Technological University
SpeechLLMMultimodal
Citations & Impact
All-time
Citations
938
 
H-index
20
 
i10-index
32
 
Publications
20
 
Co-authors
19
list available
Publications
1 items
Improving Code-Switching Speech Recognition with TTS Data Augmentation
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference · 2025
Cited
0
Resume (English only)
Academic Achievements
  • ICLR 2025: Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
  • ICLR 2025: GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
  • Preprint: Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
  • Preprint: Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
  • ICASSP 2025: SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
  • NeurIPS 2024: Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models
  • ACL 2024 (Oral): GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
  • ACL 2024: Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
  • ACL 2024: Overcoming Catastrophic Forgetting by Exemplar Selection in Task-oriented Dialogue System
  • ICLR 2024 (Spotlight, Top 5%): Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
  • ICLR 2024: It’s Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
  • NeurIPS 2023: HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
  • AAAI 2024: Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-modal Speech Representation
  • ICASSP 2024: Cross-Modality and Within-Modality Regularization for Audio-Visual DeepFake Detection
  • ICASSP 2024: An Experimental Comparison of Noise-Robust Text-To-Speech Synthesis Systems Based On Self-Supervised Representation
  • ACL 2023 (Oral): Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and
Research Experience
  • Currently a final-year Ph.D. student at the School of Computer Science and Engineering, Nanyang Technological University, working on speech and multimodal related research.
Background
  • Research interests include full-duplex speech dialog systems, text-to-speech synthesis (RLHF, streaming), generative seq2seq learning; speech recognition/translation/enhancement, efficient adaptation of foundation models; multimodal: video-to-audio generation, audio-visual understanding.