Owain Evans
Scholar

Owain Evans

Google Scholar ID: 4VpTwzIAAAAJ
Affiliate, CHAI, UC Berkeley
AI alignmentArtificial IntelligenceMachine LearningAI safetyTruthful AI
Citations & Impact
All-time
Citations
10,161
 
H-index
27
 
i10-index
38
 
Publications
20
 
Co-authors
15
list available
Resume (English only)
Academic Achievements
  • Published 'Subliminal Learning: LLMs transmit behavioral traits via hidden signals in data' – showing LLMs can transmit traits through hidden data signals
  • Published 'Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs' – demonstrating generalization of narrow misalignment to harmful behaviors
  • Developed the Situational Awareness Dataset (SAD): the first large-scale, multi-task benchmark for situational awareness in LLMs (7 task categories, 12,000+ questions)
  • Proposed 'The Reversal Curse': LLMs trained on 'A is B' fail to infer 'B is A'
  • Created TruthfulQA benchmark: reveals that larger models are more likely to mimic human falsehoods
  • Developed a lie detector for black-box LLMs using a fixed set of unrelated questions
  • Maintains active research communication via blogs, Twitter, and LessWrong
Research Experience
  • Director of the Truthful AI research group in Berkeley
  • Previously worked on AI Alignment at the Future of Humanity Institute (FHI), University of Oxford
  • Former researcher at Ought, currently serves on its Board of Directors
  • Collaborates with colleagues such as James Chua on AI safety research
  • Offers the Astra Fellowship for 6-month research stays in Berkeley, with potential conversion to full-time roles