Po-Yao (Bernie) Huang
Scholar

Po-Yao (Bernie) Huang

Google Scholar ID: E8K25LIAAAAJ
FAIR, Meta
Multi-modal learningcomputer visionnatural language processing
Citations & Impact
All-time
Citations
24,482
 
H-index
30
 
i10-index
54
 
Publications
20
 
Co-authors
18
list available
Publications
1 items
Demystifying CLIP Data
International Conference on Learning Representations · 2023
Cited
77
Resume (English only)
Academic Achievements
  • Contributor to the Llama 3.2 multimodal component as part of the Llama 3 Team
  • Chameleon: Mixed-modal early-fusion foundation models (Joint first author; contributed to pre-training)
  • Cm3: A causal masked multimodal model of the internet
  • DINOv2: Learning robust visual features without supervision (JMLR, 2024)
  • VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild (ACL, 2024)
  • SeamlessM4T—Massively Multilingual & Multimodal Machine Translation (Nature, 2024)
  • Altogether: Image Captioning via Re-aligning Alt-text (EMNLP, 2024)
  • Demystifying CLIP Data (MetaCLIP) (ICLR, 2024)
  • MoDE: CLIP Data Experts via Clustering (CVPR, 2024)
  • MAViL: Masked Audio-Video Learners (NeurIPS, 2023)
  • Diffusion Models as Masked Autoencoders (ICCV, 2023)
  • CiT: Curation in Training for Effective Vision-Language Data (ICCV, 2023)
  • Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Research Experience
  • Facebook (Meta): Senior Research Scientist, FAIR Labs (Aug 2022 – present)
  • Facebook (Meta): Research Scientist, FAIR Labs (Aug 2021 – Aug 2022)
  • Facebook (Meta): Research Intern (May 2020 – May 2021)
  • Microsoft: Research Intern, Microsoft Research (Jun 2017 – Aug 2017)
  • MediaTek: Senior Software Engineer (Jun 2012 – Jun 2014)
  • MediaTek: Software Engineer (Sep 2010 – May 2012)