2025: Papers accepted at ICLR (on system 1.x reasoning) and NAACL (on persuasion-balanced training).
2024: Two papers accepted to TMLR on fundamental problems in model editing and unlearning for multimodal models; one was an Outstanding Paper Finalist.
2024: NeurIPS paper on calibrating LLMs’ linguistic expressions of confidence.
2024: ICLR spotlight paper on deleting sensitive information from LLMs to defend against extraction attacks.
2023: Three NeurIPS papers on localization/model editing, mechanistic interpretability for vision models, and LMs teaching weaker agents.
2023: Named Outstanding Area Chair at ACL 2023 (top 1–1.5%).
2024: Serving as Senior Area Chair for ACL 2025; was Area Chair for EACL 2024 in Interpretability track.
Invited talks at Stanford, OpenAI, CHAI (UC Berkeley), TTIC, and others.