Scholar
Tomek Korbak
Google Scholar ID: YQ5rrk4AAAAJ
UK AI Security Institute
language models
AI safety
reinforcement learning
chain of thought monitoring
LLM agents
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
4,090
H-index
22
i10-index
29
Publications
20
Co-authors
8
list available
Contact
Email
tomasz.korbak@gmail.com
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
8 items
Reasoning Models Struggle to Control their Chains of Thought
2026
Cited
0
Training Agents to Self-Report Misbehavior
2026
Cited
0
Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
2025
Cited
0
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
2025
Cited
0
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
2025
Cited
0
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
2025
Cited
0
Fundamental Limitations in Defending LLM Finetuning APIs
2025
Cited
0
A sketch of an AI control safety case
2025
Cited
0
Resume (English only)
Academic Achievements
Published multiple papers at top-tier conferences including ICLR, ICML, NeurIPS, and COLM, such as:
“A sketch of an AI control safety case”
“Looking Inward: Language Models Can Learn About Themselves by Introspection” (ICLR 2025)
“Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data” (COLM 2024)
“Pretraining Language Models with Human Preferences” (ICML 2023, oral)
“On reinforcement learning and distribution matching for fine-tuning language models with no catastrophic forgetting” (NeurIPS 2022, oral)
Many papers accompanied by open-source code.
Co-authors
8 total
Owain Evans
Affiliate, CHAI, UC Berkeley
Ethan Perez
Anthropic
Samuel R. Bowman
Anthropic and NYU
Kyunghyun Cho
New York University, Genentech
Co-author 5
David Duvenaud
Associate Professor, University of Toronto
Amanda Askell
Anthropic
Anil Seth
Sussex University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up