AgoraResearch hub
ExploreLibraryProfile
Account
Jan Leike
Scholar

Jan Leike

Google Scholar ID: beiWcokAAAAJ
Anthropic
reinforcement learningdeep learningagent alignment
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
57,951
 
H-index
28
 
i10-index
37
 
Publications
20
 
Co-authors
20
list available
Contact
No contact links provided.
Publications
9 items
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
arXiv.org · 2026
Cited
2
Excess Description Length of Learning Generalizable Predictors
arXiv.org · 2026
Cited
0
Natural Emergent Misalignment from Reward Hacking in Production RL
2025
Cited
0
Limit-Computable Grains of Truth for Arbitrary Computable Extensive-Form (Un)Known Games
2025
Cited
0
Unsupervised Elicitation of Language Models
2025
Cited
0
Reasoning Models Don't Always Say What They Think
2025
Cited
7
Auditing language models for hidden objectives
2025
Cited
0
Forecasting Rare Language Model Behaviors
2025
Cited
0
Resume (English only)
Co-authors
20 total
Jeffrey Wu
Jeffrey Wu
Anthropic AI, OpenAI
Co-author 2
Co-author 2
John Schulman
John Schulman
Thinking Machines
Co-author 4
Co-author 4
Co-author 5
Co-author 5
Co-author 6
Co-author 6
Marcus Hutter
Marcus Hutter
Researcher@DeepMind & Professor at ANU
David Scott Krueger
David Scott Krueger
Assistant Professor, University of Montreal, Mila

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?