Scholar

Puyuan Peng

Google Scholar ID: 2TUruWMAAAAJ

Research Scientist, Meta Superintelligence Lab

Speech-to-speech ChatbotLLMsComputer VisionArtificial Intelligence

Citations & Impact

All-time

Citations

684

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

20 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

VoiceCraft (ACL 2024 Oral): A zero-shot TTS and speech editing model that gained 7.7k GitHub stars within five months and ranked #1 globally on trending.
Audio-Visual Latent Diffusion Model (ECCV 2024 Oral): Generates realistic action sounds for silent egocentric videos with demonstrated zero-shot transfer in VR games.
PromptingWhisper (Interspeech 2023): Pioneered prompt-based techniques for large speech models to enable zero-shot audio-visual ASR and speech translation without fine-tuning.
Visually Grounded Speech research series (Interspeech 2022/2023, ICASSP 2022, ASRU 2023): Achieved state-of-the-art results in speech-image retrieval, zero-resource speech recognition, and data-efficient representation learning.
Published multiple papers at top-tier venues including ACL, ECCV, ICML, Interspeech, ICLR, EMNLP, ICCV, and WACV.
Contributed to Dynamic-SUPERB Phase-2 (ICLR 2025), a collaborative benchmark with 180 tasks for evaluating spoken language models.

Co-authors

16 total