Scholar

Fazl Barez

Google Scholar ID: EAjpNIMAAAAJ

University of Oxford

AI SafetyExplainabilityInterpretabilityAI Governance and Policy

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,573

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailfazlbarez93@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

32 items

Interpretability Can Be Actionable

2026

Cited

Rigorous Interpretation Is a Form of Evaluation

2026

Cited

AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

2026

Cited

Old Habits Die Hard: How Conversational History Geometrically Traps LLMs

2026

Cited

Same Answer, Different Representations: Hidden instability in VLMs

2026

Cited

Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

2025

Cited

Chain-of-Thought Hijacking

2025

Cited

HACK: Hallucinations Along Certainty and Knowledge Axes

2025

Cited

Resume (English only)

Academic Achievements

Developed the N2G algorithm adopted by OpenAI to evaluate sparse autoencoders for interpretability
Spearheaded the Alan Turing Institute's response to the UK House of Lords inquiry on large language models
Multiple papers accepted at top conferences: NeurIPS 2025 (3), EMNLP 2025 (4), ICLR 2025, NeurIPS 2024, EMNLP 2024 (2), ACL 2024, ICML 2024 (2), etc.
Serving as Area Chair for ACL 2025
Programme Committee member for ECAI 2024
Co-organized the first Mechanistic Interpretability workshop at ICML 2024
Research funded by OpenAI, Anthropic, Schmidt Sciences, Future of Life Institute, NVIDIA, among others

Research Experience

Currently Senior Research Fellow at the University of Oxford leading research on Technical AI Safety and Governance
Collaborated with Anthropic's Alignment team (2024–2025) on studies of deception and reward hacking
Led research with the UK AI Security Institute on machine unlearning for AI safety
Previously researcher at Amazon and Huawei
Former Co-director and Head of Research at Apart Research
Affiliated with Cambridge's CSER, NTU's Digital Trust Centre, Edinburgh's Informatics, and member of ELLIS

Background

Senior Research Fellow at the University of Oxford, leading research on Technical AI Safety and Governance
Bridges technical innovation with real-world impact—developing algorithms adopted by major AI labs and shaping high-level government policy
Currently serves as an advisor to Martian
Interested in commercializing interpretability research through startups to address real-world problems
Values collaborative work and is especially committed to partnering with researchers from underrepresented, marginalized, or disadvantaged backgrounds

Co-authors

17 total