Evan Hubinger
Scholar

Evan Hubinger

Google Scholar ID: LRivg1cAAAAJ
Member of Technical Staff, Anthropic
AGI Safety
Citations & Impact
All-time
Citations
2,514
 
H-index
19
 
i10-index
26
 
Publications
20
 
Co-authors
0
 
Resume (English only)
Academic Achievements
  • Auditing language models for hidden objectives
  • Alignment faking in large language models
  • Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
  • Conditioning Predictive Models
  • An overview of 11 proposals for building safe advanced AI
  • Risks from Learned Optimization
Research Experience
  • Previously: MIRI, OpenAI
Background
  • Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic's positions, policies, strategies, or opinions.
Miscellany
  • Personal interests: Engaging in discussions and encouraging people to apply for the Anthropic Fellows program, which is a safety-focused mentorship program.
Co-authors
0 total
Co-authors: 0 (list not available)