Publications: 'How to Train Data-Efficient LLMs', 'Farzi Data: Autoregressive Data Distillation', 'Off-Policy Evaluation for Large Action Spaces via Policy Convolution', 'Data Distillation: A Survey', 'Infinite Recommendation Networks: A Data-Centric Approach'. Conferences include arXiv, The Web Conference (WWW), TMLR, NeurIPS.
Research Experience
Research Intern at Google DeepMind: Jun 2023 - Aug 2024, with W.C. Kang & D. Cheng; Research Intern at Netflix Research: Jun 2022 - Sep 2022, with D. Liang & N. Kallus; Research Intern at Pinterest: Jun 2021 - Sep 2021, with Jiajing Xu; Research Intern at Microsoft Research: Jan 2020 - Jun 2020, with Manik Varma; Research Assistant at UC San Diego: Aug 2019 - Oct 2019, with Julian McAuley; Research Intern at Cornell University: Jun 2019 - Jul 2019, with Thorsten Joachims.
Education
Ph.D. in Computer Science & Engineering from UC San Diego, 2020-2024, advised by Prof. Julian McAuley; B.Tech & M.S. (by research) in Computer Science & Engineering from IIIT Hyderabad, 2015-2020.
Background
Research Interests: Data-efficient machine learning; Field: Computer Science & Engineering; Brief: Currently a research scientist at Google DeepMind, working on pretraining data for the flagship Gemini and Gemma models.
Miscellany
The personal website uses a minimalist template created by Jon Barron.