Scholar

Johannes Treutlein

Google Scholar ID: 9OqlFycAAAAJ

Anthropic

AI Safety

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

458

H-index

10

i10-index

11

Publications

15

Co-authors

20

list available

Contact

No contact links provided.

Publications

2 items

School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs

2025

Cited

0

Auditing language models for hidden objectives

2025

Cited

0

Resume (English only)

Co-authors

20 total

Caspar Oesterheld

Carnegie Mellon University

Associate Professor, University of Oxford

Member of Technical Staff, Anthropic

Associate Professor, University of Toronto

Affiliate, CHAI, UC Berkeley

University of Toronto; Vector Institute