Published several papers, including 'A Practitioner’s Guide to Continual Multimodal Pretraining', 'No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance', 'Efficient Model Evaluation in an Era of Rapid Progress', 'CiteME: Can Language Models Accurately Cite Scientific Claims?', and 'Visual Data-Type Understanding does not emerge from Scaling Vision-Language Models'.
Research Experience
Interned at Google Zürich, working with Yongqin Xian, Alessio Tonioni, Federico Tombari, and Olivier Henaff. Closely collaborated with Ferjad Naeem, Nikhil Parthasarathy, and Talfan Evans.
Education
Jointly working with Matthias Bethge at The University of Tuebingen and Samuel Albanie at The University of Cambridge/Google Deepmind. Also a part of the International Max Planck Research School for Intelligent Systems. Previously, an MPhil Machine Learning and Machine Intelligence student at The University of Cambridge. Thesis was on 'Understanding and Fixing the Modality Gap in VLMs'. Graduated from IIIT Delhi with a Bachelors in Computer Science in July 2020.
Background
Third-year ELLIS PhD student, with research interests in data-centric machine learning, robustness/generalization to distribution shifts, and foundation models. Mainly focused on understanding the generalization properties of foundation models (like vision-language models and large multi-modal models) through their pre-training and test data distributions.
Miscellany
Previously worked with several mentors, including Ankush Gupta (Google Deepmind), Sungjin Ahn (KAIST), Tanmoy Chakraborty (IIT Delhi), Rajiv Ratn Shah (IIIT Delhi), Saket Anand (IIIT Delhi), Rajesh Kumar (Bucknell University), Anubha Gupta (IIIT Delhi), and Jainendra Shukla (IIIT Delhi).