Selected publications include 'Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech' (NAACL, 2025), 'Wave-Tacotron: Spectrogram-Free End-to-End Text-to-Speech Synthesis' (ICASSP, 2021), and others.
Research Experience
Joined Google Research in 2017 and is currently part of the Foundational Research organization within Google DeepMind. Previously, worked as a Research Scientist at Baidu Silicon Valley Artificial Intelligence Lab (SVAIL), contributing to the Deep Speech 2 project. Developed algorithms for audio event detection and music mood classification at Gracenote in Emeryville, CA.
Education
PhD in Electrical Engineering and Computer Sciences, 2012, University of California, Berkeley, advised by David Wessel (CNMAT) and Nelson Morgan (ICSI); MS in Electrical Engineering and Computer Sciences, 2008, University of California, Berkeley; BS in Electrical Engineering, 2005, University of California, Santa Barbara.
Background
Research interests include multimodal AI, sequence modeling, audio generation/understanding, speech synthesis/recognition, deep learning/neural networks, and parallel and accelerated computing. Currently a Research Scientist at Google DeepMind, focusing on generative modeling and machine perception research to make interactions with technology more natural and seamless.