Scholar
Johannes Treutlein
Google Scholar ID: 9OqlFycAAAAJ
Anthropic
AI Safety
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
458
H-index
10
i10-index
11
Publications
15
Co-authors
20
list available
Contact
No contact links provided.
Publications
2 items
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
2025
Cited
0
Auditing language models for hidden objectives
2025
Cited
0
Resume (English only)
Co-authors
20 total
Samuel Marks
Anthropic
Co-author 2
Caspar Oesterheld
Carnegie Mellon University
Jakob Foerster
Associate Professor, University of Oxford
Evan Hubinger
Member of Technical Staff, Anthropic
Roger Grosse
Associate Professor, University of Toronto
Owain Evans
Affiliate, CHAI, UC Berkeley
Co-author 8
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up