December 2024: Paper identifying (in)abilities of SAEs awarded best paper at NeurIPS Foundation model interventions workshop!
September 2024: Paper on hidden capabilities in generative models accepted as a spotlight at NeurIPS, 2024.
August 2024: Preprint on a percolation model of emergent capabilities is on arXiv now.
June 2024: Paper on identifying how jailbreaks bypass safety mechanisms accepted at NeurIPS 2024.
October 2023: Paper on analyzing in-context learning as a subjective randomness task accepted to ICLR, 2024.
October 2023: Our work on multiplicative emergence of compositional abilities was accepted to NeurIPS, 2023.
April 2023: Our work on a mechanistic understanding of loss landscapes was accepted to ICML, 2023.
January 2023: Our work analyzing loss landscape of self-supervised objectives was accepted to ICLR, 2023.
October 2021: Our work on dynamics of normalization layers was accepted to NeurIPS, 2021.
March 2021: Our work on theory of pruning was accepted as a spotlight at ICLR, 2021.
Research Experience
Currently a research fellow at the CBS-NTT Program in Physics of Intelligence at Harvard University, leading the phenomenological theory team and often collaborating with Hidenori Tanaka, David Krueger, and Demba Ba. His undergraduate research primarily focused on embedded systems, such as energy-efficient machine vision systems.
Education
Graduated with a Bachelor's degree in ECE from Indian Institute of Technology (IIT), Roorkee in 2019. Did his PhD co-affiliated with EECS, University of Michigan and CBS, Harvard, and was advised by Robert Dick and Hidenori Tanaka.
Background
Research interests include designing (faithful) abstractions of phenomena relevant to controlling or aligning neural networks, and better understanding the training dynamics of neural networks, especially via a statistical physics perspective.