Scholar

Ethan Perez

Google Scholar ID: za0-taQAAAAJ

Anthropic

AI Safety

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

26,523

H-index

41

i10-index

57

Publications

20

Co-authors

13

list available

Contact

No contact links provided.

Publications

14 items

The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?

2026

Cited

0

Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks

arXiv.org · 2026

Cited

2

Natural Emergent Misalignment from Reward Hacking in Production RL

2025

Cited

0

Agentic Misalignment: How LLMs Could Be Insider Threats

2025

Cited

0

Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks

2025

Cited

0

Inverse Scaling in Test-Time Compute

2025

Cited

0

Unsupervised Elicitation of Language Models

2025

Cited

0

Reasoning Models Don't Always Say What They Think

2025

Cited

7

Resume (English only)

Co-authors

13 total

Contextual AI, Stanford University

Director of Agentic AI, Cohere

Samuel R. Bowman

Anthropic and NYU

Sebastian Riedel

Honorary Professor @ University College London, Researcher @ DeepMind

New York University, Genentech

Aaron Courville

Professor, DIRO, Université de Montréal, Mila, Cifar CAI chair

Johns Hopkins University & Anthropic

Member of Technical Staff, Anthropic