AgoraResearch hub
ExploreLibraryProfile
Account
Mikita Balesni
Scholar

Mikita Balesni

Google Scholar ID: mDXcNBMAAAAJ
Research Scientist, Apollo Research
large language modelsartificial intelligence safety
Google Scholar↗
Citations & Impact
All-time
Citations
911
 
H-index
9
 
i10-index
9
 
Publications
15
 
Co-authors
9
list available
Contact
No contact links provided.
Publications
6 items
Stress Testing Deliberative Alignment for Anti-Scheming Training
2025
Cited
0
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
2025
Cited
0
AI Behind Closed Doors: a Primer on The Governance of Internal Deployment
2025
Cited
0
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
2025
Cited
0
Frontier Models are Capable of In-context Scheming
arXiv.org · 2024
Cited
1
The Two-Hop Curse: LLMs trained on A$ ightarrow$B, B$ ightarrow$C fail to learn A$ ightarrow$C
2024
Cited
0
Resume (English only)
Co-authors
9 total
Jérémy Scheurer
Jérémy Scheurer
Apollo Research
Owain Evans
Owain Evans
Affiliate, CHAI, UC Berkeley
Tomek Korbak
Tomek Korbak
UK AI Security Institute
Lukas Berglund
Lukas Berglund
U.S. AI Safety Institute
Meg Tong
Meg Tong
Anthropic
Asa Cooper Stickland
Asa Cooper Stickland
Research Scientist, UK AI Security Institute
Max Kaufmann
Max Kaufmann
University of Toronto / Vector Institute
Lee D Sharkey
Lee D Sharkey
Goodfire

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?