Holds a Schmidt Sciences AI2050 Senior Fellowship, Sloan Fellowship, and Canada CIFAR AI Chair.
Research Experience
Works as an Associate Professor of Computer Science at the University of Toronto; Member of Technical Staff at Anthropic's Alignment Science team; Research interests include: efficiently figuring out which training examples were responsible for surprising behaviors from an AI system; assuring the safety of AI systems capable of strategic deception; eliciting reliable information from models we don't fully trust; and efficiently removing dangerous or unwanted information from a trained model.
Education
No specific educational background information provided.
Background
Associate Professor of Computer Science at the University of Toronto, Schwartz Reisman Chair in Technology and Society, and a founding member of the Vector Institute. Member of Technical Staff on the Alignment Science Team at Anthropic, focusing on training data attribution. Research focuses on better understanding neural net training dynamics to improve training speed, generalization, uncertainty estimation, and automatic hyperparameter tuning. Now focusing on applying this understanding to AI alignment, specifically on ensuring AIs are robustly aligned with human values.
Miscellany
Contact: Department of Computer Science, University of Toronto, Office: Pratt 290F, 6 King's College Rd., Toronto, ON M5S 3G4, Canada. Phone: 416-978-7391, e-mail: rgrosse_at_cs_toronto_edu.