Fazl Barez
Scholar

Fazl Barez

Google Scholar ID: EAjpNIMAAAAJ
University of Oxford
AI SafetyExplainabilityInterpretabilityAI Governance and Policy
Citations & Impact
All-time
Citations
1,573
 
H-index
17
 
i10-index
21
 
Publications
20
 
Co-authors
17
list available
Resume (English only)
Academic Achievements
  • Developed the N2G algorithm adopted by OpenAI to evaluate sparse autoencoders for interpretability
  • Spearheaded the Alan Turing Institute's response to the UK House of Lords inquiry on large language models
  • Multiple papers accepted at top conferences: NeurIPS 2025 (3), EMNLP 2025 (4), ICLR 2025, NeurIPS 2024, EMNLP 2024 (2), ACL 2024, ICML 2024 (2), etc.
  • Serving as Area Chair for ACL 2025
  • Programme Committee member for ECAI 2024
  • Co-organized the first Mechanistic Interpretability workshop at ICML 2024
  • Research funded by OpenAI, Anthropic, Schmidt Sciences, Future of Life Institute, NVIDIA, among others
Research Experience
  • Currently Senior Research Fellow at the University of Oxford leading research on Technical AI Safety and Governance
  • Collaborated with Anthropic's Alignment team (2024–2025) on studies of deception and reward hacking
  • Led research with the UK AI Security Institute on machine unlearning for AI safety
  • Previously researcher at Amazon and Huawei
  • Former Co-director and Head of Research at Apart Research
  • Affiliated with Cambridge's CSER, NTU's Digital Trust Centre, Edinburgh's Informatics, and member of ELLIS
Background
  • Senior Research Fellow at the University of Oxford, leading research on Technical AI Safety and Governance
  • Bridges technical innovation with real-world impact—developing algorithms adopted by major AI labs and shaping high-level government policy
  • Currently serves as an advisor to Martian
  • Interested in commercializing interpretability research through startups to address real-world problems
  • Values collaborative work and is especially committed to partnering with researchers from underrepresented, marginalized, or disadvantaged backgrounds