Paper 'Robust agents learn causal world models' received ICLR 2024 Oral presentation and Honorable Mention Outstanding Paper Award
Published 'Evaluating the Goal-Directedness of Large Language Models', introducing an empirically predictive and cross-task consistent measure of LLM goal-directedness
Co-authored 'The Reasons that Agents Act: Intention and Instrumental Goals' (AAMAS 2024), formalizing intent in causal models
Co-wrote 'AGI Safety Literature Review' (IJCAI 2018), a comprehensive survey of the AGI safety field
Co-developed 'AI Safety Gridworlds' (2017), making AGI safety problems concrete through testable environments
Proposed modeling AGI safety frameworks using causal influence diagrams (IJCAI AI Safety Workshop, 2019)
Developed a general method to infer agent incentives directly from graphical models, notably in 'Agent Incentives: A Causal Perspective' (AAAI 2021)
Conducted foundational AGI safety research based on the UAI/AIXI framework (e.g., 2016 paper with Marcus Hutter)
Background
Staff Research Scientist at Google DeepMind
Focuses on AGI Safety—how to safely build and use highly intelligent AI systems
Authored the first PhD thesis specifically on AGI safety: 'Towards Safe Artificial General Intelligence'
Currently exploring AGI safety approaches based on amplification of human agency
Led the Causal Incentives Working Group, developing alignment theory grounded in Pearlian causality