Scholar
Fazl Barez
Google Scholar ID: EAjpNIMAAAAJ
University of Oxford
AI Safety
Explainability
Interpretability
AI Governance and Policy
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
1,573
H-index
17
i10-index
21
Publications
20
Co-authors
17
list available
Contact
Email
fazlbarez93@gmail.com
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
29 items
AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
2026
Cited
0
Same Answer, Different Representations: Hidden instability in VLMs
2026
Cited
0
Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value
2025
Cited
1
Chain-of-Thought Hijacking
2025
Cited
0
HACK: Hallucinations Along Certainty and Knowledge Axes
2025
Cited
0
VAL-Bench: Measuring Value Alignment in Language Models
2025
Cited
0
Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
2025
Cited
0
Query Circuits: Explaining How Language Models Answer User Prompts
2025
Cited
0
Load more
Resume (English only)
Academic Achievements
Developed the N2G algorithm adopted by OpenAI to evaluate sparse autoencoders for interpretability
Spearheaded the Alan Turing Institute's response to the UK House of Lords inquiry on large language models
Multiple papers accepted at top conferences: NeurIPS 2025 (3), EMNLP 2025 (4), ICLR 2025, NeurIPS 2024, EMNLP 2024 (2), ACL 2024, ICML 2024 (2), etc.
Serving as Area Chair for ACL 2025
Programme Committee member for ECAI 2024
Co-organized the first Mechanistic Interpretability workshop at ICML 2024
Research funded by OpenAI, Anthropic, Schmidt Sciences, Future of Life Institute, NVIDIA, among others
Research Experience
Currently Senior Research Fellow at the University of Oxford leading research on Technical AI Safety and Governance
Collaborated with Anthropic's Alignment team (2024–2025) on studies of deception and reward hacking
Led research with the UK AI Security Institute on machine unlearning for AI safety
Previously researcher at Amazon and Huawei
Former Co-director and Head of Research at Apart Research
Affiliated with Cambridge's CSER, NTU's Digital Trust Centre, Edinburgh's Informatics, and member of ELLIS
Background
Senior Research Fellow at the University of Oxford, leading research on Technical AI Safety and Governance
Bridges technical innovation with real-world impact—developing algorithms adopted by major AI labs and shaping high-level government policy
Currently serves as an advisor to Martian
Interested in commercializing interpretability research through startups to address real-world problems
Values collaborative work and is especially committed to partnering with researchers from underrepresented, marginalized, or disadvantaged backgrounds
Co-authors
17 total
Philip Torr
Professor, University of Oxford
David Scott Krueger
Assistant Professor, University of Montreal, Mila
Shay Cohen
University of Edinburgh
Mrinank Sharma
Anthropic
Ethan Perez
Anthropic
Evan Hubinger
Member of Technical Staff, Anthropic
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Mor Geva
Tel Aviv University, Google Research
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up