Scholar

Mikita Balesni

Google Scholar ID: mDXcNBMAAAAJ

Research Scientist, Apollo Research

large language modelsartificial intelligence safety

Google Scholar↗

Citations & Impact

All-time

Citations

911

H-index

9

i10-index

9

Publications

15

Co-authors

9

list available

Contact

No contact links provided.

Publications

6 items

Stress Testing Deliberative Alignment for Anti-Scheming Training

2025

Cited

0

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

2025

Cited

0

AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

2025

Cited

0

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence

2025

Cited

0

Frontier Models are Capable of In-context Scheming

arXiv.org · 2024

Cited

1

The Two-Hop Curse: LLMs trained on A$ ightarrow$B, B$ ightarrow$C fail to learn A$ ightarrow$C

2024

Cited

0

Resume (English only)

Co-authors

9 total

Jérémy Scheurer

Apollo Research

Affiliate, CHAI, UC Berkeley

UK AI Security Institute

U.S. AI Safety Institute

Asa Cooper Stickland

Research Scientist, UK AI Security Institute

University of Toronto / Vector Institute