Scholar
Arthur Conmy
Google Scholar ID: n4HIyXQAAAAJ
Google DeepMind
AGI Safety
AI Safety
Interpretability
Mechanistic Interpretability
Machine Learning
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
2,569
H-index
18
i10-index
20
Publications
20
Co-authors
20
list available
Contact
No contact links provided.
Publications
16 items
How do LLMs Compute Verbal Confidence
2026
Cited
0
Automatically Finding Reward Model Biases
2026
Cited
0
Simple LLM Baselines are Competitive for Model Diffing
2026
Cited
0
Fluid Representations in Reasoning Models
2026
Cited
0
Building Production-Ready Probes For Gemini
2026
Cited
1
Base Models Know How to Reason, Thinking Models Learn When
2025
Cited
0
Eliciting Secret Knowledge from Language Models
2025
Cited
0
Thought Anchors: Which LLM Reasoning Steps Matter?
2025
Cited
0
Load more
Resume (English only)
Co-authors
20 total
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Senthooran Rajamanoharan
Google DeepMind
Co-author 3
Aengus Lynch
University College London
Co-author 5
Rohin Shah
Research Scientist, Google DeepMind
Jacob Steinhardt
Stanford University
Rowan Wang
Unknown affiliation
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up