Published papers: 'Do Language Models Robustly Acquire New Knowledge?' (NeurIPS CCFM, 2025) and 'Parameters vs FLOPs: Scaling Laws for Optimal Sparsity of MoE Language Models' (ICML, 2025 + ICLR SLLM, 2025). The former explores how language models acquire new knowledge through multi-hop reasoning tasks, while the latter investigates the relationship between the number of parameters and compute per example in sparse Mixture-of-Expert models and its impact on model performance.
Research Experience
Worked for two years at Microsoft Research with Praneeth Netrapalli and Prateek Jain. Currently a PhD student at MIT EECS.
Education
Spent two years at Microsoft Research working with Praneeth Netrapalli and Prateek Jain before starting his PhD. Received a BS in CS and Stats from the University of Illinois at Urbana-Champaign, and has interned at Google Research, Apple MLR, and Akuna Capital. Advised by Aleksander Mądry during his PhD.
Background
PhD student at MIT EECS, interested in understanding and steering large-scale machine learning models. Recent work focuses on developing tools for analyzing model behavior via targeted interventions to learning algorithms, training data, in-context information, and learned representations.