AgoraResearch hub
ExploreLibraryProfile
Account
Johannes Treutlein
Scholar

Johannes Treutlein

Google Scholar ID: 9OqlFycAAAAJ
Anthropic
AI Safety
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
458
 
H-index
10
 
i10-index
11
 
Publications
15
 
Co-authors
20
list available
Contact
No contact links provided.
Publications
2 items
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
2025
Cited
0
Auditing language models for hidden objectives
2025
Cited
0
Resume (English only)
Co-authors
20 total
Samuel Marks
Samuel Marks
Anthropic
Co-author 2
Co-author 2
Caspar Oesterheld
Caspar Oesterheld
Carnegie Mellon University
Jakob Foerster
Jakob Foerster
Associate Professor, University of Oxford
Evan Hubinger
Evan Hubinger
Member of Technical Staff, Anthropic
Roger Grosse
Roger Grosse
Associate Professor, University of Toronto
Owain Evans
Owain Evans
Affiliate, CHAI, UC Berkeley
Co-author 8
Co-author 8

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?