Tom Everitt
Scholar

Tom Everitt

Google Scholar ID: BdulyjIAAAAJ
Staff Research Scientist at Google DeepMind
AI SafetyArtificial General IntelligenceCausalityIncentives
Citations & Impact
All-time
Citations
2,699
 
H-index
24
 
i10-index
39
 
Publications
20
 
Co-authors
55
list available
Resume (English only)
Academic Achievements
  • Paper 'Robust agents learn causal world models' received ICLR 2024 Oral presentation and Honorable Mention Outstanding Paper Award
  • Published 'Evaluating the Goal-Directedness of Large Language Models', introducing an empirically predictive and cross-task consistent measure of LLM goal-directedness
  • Co-authored 'The Reasons that Agents Act: Intention and Instrumental Goals' (AAMAS 2024), formalizing intent in causal models
  • Co-wrote 'AGI Safety Literature Review' (IJCAI 2018), a comprehensive survey of the AGI safety field
  • Co-developed 'AI Safety Gridworlds' (2017), making AGI safety problems concrete through testable environments
  • Proposed modeling AGI safety frameworks using causal influence diagrams (IJCAI AI Safety Workshop, 2019)
  • Developed a general method to infer agent incentives directly from graphical models, notably in 'Agent Incentives: A Causal Perspective' (AAAI 2021)
  • Conducted foundational AGI safety research based on the UAI/AIXI framework (e.g., 2016 paper with Marcus Hutter)
Background
  • Staff Research Scientist at Google DeepMind
  • Focuses on AGI Safety—how to safely build and use highly intelligent AI systems
  • Authored the first PhD thesis specifically on AGI safety: 'Towards Safe Artificial General Intelligence'
  • Currently exploring AGI safety approaches based on amplification of human agency
  • Led the Causal Incentives Working Group, developing alignment theory grounded in Pearlian causality