Scholar
Mikita Balesni
Google Scholar ID: mDXcNBMAAAAJ
Research Scientist, Apollo Research
large language models
artificial intelligence safety
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
911
H-index
9
i10-index
9
Publications
15
Co-authors
9
list available
Contact
No contact links provided.
Publications
6 items
Stress Testing Deliberative Alignment for Anti-Scheming Training
2025
Cited
0
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
2025
Cited
0
AI Behind Closed Doors: a Primer on The Governance of Internal Deployment
2025
Cited
0
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
2025
Cited
0
Frontier Models are Capable of In-context Scheming
arXiv.org · 2024
Cited
1
The Two-Hop Curse: LLMs trained on A$ ightarrow$B, B$ ightarrow$C fail to learn A$ ightarrow$C
2024
Cited
0
Resume (English only)
Co-authors
9 total
Jérémy Scheurer
Apollo Research
Owain Evans
Affiliate, CHAI, UC Berkeley
Tomek Korbak
UK AI Security Institute
Lukas Berglund
U.S. AI Safety Institute
Meg Tong
Anthropic
Asa Cooper Stickland
Research Scientist, UK AI Security Institute
Max Kaufmann
University of Toronto / Vector Institute
Lee D Sharkey
Goodfire
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up