Published multiple papers in arXiv, EMNLP, ICML, and ACL, including 'Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains' and 'Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math'.
Research Experience
Research Scientist at Salesforce AI Research. Interned at Duolingo and Amazon during his graduate studies.
Education
PhD from Georgia Tech, advised by Mark Davenport and collaborated closely with Ashwin Pananjady. BSE from the University of Michigan. During graduate school, interned at Duolingo, working with Will Monroe, and at Amazon, working with Arjun Seshadri, Mariya Vasileva, and Achal Dave.
Background
Research Scientist currently at Salesforce AI Research. His current research broadly focuses on improving the reasoning ability of foundation models, particularly large language models, with a particular emphasis on automatic evaluation (generative reward models/LLM-as-judges). He works largely on post-training and evaluation. Broadly, he is interested in the role humans play in the era of large models: when are human responses necessary, and when can we avoid collecting human feedback?
Miscellany
In his free time, he enjoys cooking (and eating), reading, running, and watching basketball (NBA and college).