Puyuan  Peng
Scholar

Puyuan Peng

Google Scholar ID: 2TUruWMAAAAJ
Research Scientist, Meta Superintelligence Lab
Speech-to-speech ChatbotLLMsComputer VisionArtificial Intelligence
Citations & Impact
All-time
Citations
684
 
H-index
13
 
i10-index
14
 
Publications
20
 
Co-authors
16
list available
Publications
20 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • VoiceCraft (ACL 2024 Oral): A zero-shot TTS and speech editing model that gained 7.7k GitHub stars within five months and ranked #1 globally on trending.
  • Audio-Visual Latent Diffusion Model (ECCV 2024 Oral): Generates realistic action sounds for silent egocentric videos with demonstrated zero-shot transfer in VR games.
  • PromptingWhisper (Interspeech 2023): Pioneered prompt-based techniques for large speech models to enable zero-shot audio-visual ASR and speech translation without fine-tuning.
  • Visually Grounded Speech research series (Interspeech 2022/2023, ICASSP 2022, ASRU 2023): Achieved state-of-the-art results in speech-image retrieval, zero-resource speech recognition, and data-efficient representation learning.
  • Published multiple papers at top-tier venues including ACL, ECCV, ICML, Interspeech, ICLR, EMNLP, ICCV, and WACV.
  • Contributed to Dynamic-SUPERB Phase-2 (ICLR 2025), a collaborative benchmark with 180 tasks for evaluating spoken language models.