First-Author Publications include: mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition (IEEE Signal Processing Letters, 2025); Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation (Interspeech, 2024); AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition (Preprint, 2023), etc.
Research Experience
Part of the Sight and Sound MIT-IBM Watson AI Lab project working on multi-modal learning; Intern at Meta's FAIR Seamless team during summer 2024; Worked with Dr. Tatiana Likhomanenko at Apple's Machine Learning Research Group during summer 2023.
Education
PhD student at MIT CSAIL in the Spoken Language Systems Group, advised by Dr. Jim Glass; M.Eng. in EECS from MIT in 2021, advised by Dr. Jim Glass and Professor David Harwath; S.B. in EECS from MIT in 2019, worked with Professor Antonio Torralba and Professor Josh McDermott.
Background
Research Interests: Speech Recognition, Multi-modal Learning; Field: Electrical Engineering and Computer Science.