Introduced the first prompt-adaptation method to frozen acoustic models [ICML 21], nominated for best paper; explored the first series of n-best hypotheses based generative error correction (ASR & translation pre-training & post-training) [ASRU 23]; co-invented Whispering-LLaMA [EMNLP 23] and HyPoradise [NeurIPS 23], and speech post-training to LLM [ICLR 24]. Honorable mention for best industry paper on multimodal n-best correction [ACL 25].
Research Experience
Senior Research Scientist at NVIDIA; previously a full-time employee at Amazon AGI, working with Andreas Stolcke in Ivan Bulyko's team; also an intern at Google Speech & Brain teams (now DeepMind), collaborating with Bo Li, Yu Zhang, and Tara N. Sainath's team.
Education
Ph.D., Georgia Institute of Technology, advised by Prof. Chin-Hui Lee. Before starting his Ph.D., visited Prof. Jesper Tegnér's group on self-evolutionary algorithms and interned at TSMC in mixed-signal IC design.
Background
Focused on speech-language alignment and scaling laws. Prior to joining NVIDIA, worked full-time at Amazon AGI with Andreas Stolcke, and interned at Google Speech & Brain teams (now DeepMind), co-hosted by Bo Li and Yu Zhang in Tara N. Sainath's team.