Scholar

Owain Evans

Google Scholar ID: 4VpTwzIAAAAJ

Affiliate, CHAI, UC Berkeley

AI alignmentArtificial IntelligenceMachine LearningAI safetyTruthful AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

10,161

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailowaine@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

13 items

Negation Neglect: When models fail to learn negations in training

2026

Cited

Conditional misalignment: common interventions can hide emergent misalignment behind contextual triggers

2026

Cited

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

2025

Cited

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

2025

Cited

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

2025

Cited

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

2025

Cited

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

2025

Cited

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

2025

Cited

Resume (English only)

Academic Achievements

Published 'Subliminal Learning: LLMs transmit behavioral traits via hidden signals in data' – showing LLMs can transmit traits through hidden data signals
Published 'Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs' – demonstrating generalization of narrow misalignment to harmful behaviors
Developed the Situational Awareness Dataset (SAD): the first large-scale, multi-task benchmark for situational awareness in LLMs (7 task categories, 12,000+ questions)
Proposed 'The Reversal Curse': LLMs trained on 'A is B' fail to infer 'B is A'
Created TruthfulQA benchmark: reveals that larger models are more likely to mimic human falsehoods
Developed a lie detector for black-box LLMs using a fixed set of unrelated questions
Maintains active research communication via blogs, Twitter, and LessWrong

Research Experience

Director of the Truthful AI research group in Berkeley
Previously worked on AI Alignment at the Future of Humanity Institute (FHI), University of Oxford
Former researcher at Ought, currently serves on its Board of Directors
Collaborates with colleagues such as James Chua on AI safety research
Offers the Astra Fellowship for 6-month research stays in Berkeley, with potential conversion to full-time roles

Co-authors

15 total