Published several recent papers, such as 'Improving Informally Romanized Language Identification' (EMNLP conference), 'Tools of the Scribe: How Writing Systems, Technology, and Human Factors Interact To Affect the Act of Writing' (forthcoming by Springer Nature), and 'Context-aware transliteration of Romanized South Asian languages' (Computational Linguistics journal). Co-organized the Second Workshop on Computation and Written Language (CAWL 2024) and helped establish the ACL Special Interest Group on Writing Systems and Written Language (SIGWrit). Was co-Editor-in-chief for the Transactions of the Association for Computational Linguistics (TACL) from 2018-2022. Developed resources like the Dakshina dataset.
Research Experience
Before joining Google as a research scientist in May 2013, was a faculty member for 9 years in the Center for Spoken Language Understanding (CSLU) at Oregon Health & Science University (OHSU). Prior to that, worked in the Speech Algorithms Department at AT&T Labs - Research from 2001 – 2004.
Education
Ph.D. in the Department of Cognitive and Linguistic Sciences at Brown University in 2001. Part of the Brown Laboratory for Linguistic Information Processing.
Background
A computational linguist working on various topics in natural language processing. Research interests include: transliteration and text normalization; language identification; language modeling for automatic speech recognition, text entry, and other applications; weighted transducers and grammars; supervised and unsupervised learning of language models; pronunciation modeling; text entry, accessibility, and augmentative & alternative communication (AAC); syntactic parsing of text and speech; statistical models of human language processing; spoken language processing for diagnosis of neurodevelopmental and neurodegenerative disorders.
Miscellany
Activities and interests include giving a talk on 'Empirical methods in context-aware transliteration' at the Eugene Charniak Memorial Symposium.