AgoraResearch hub
ExploreLibraryProfile
Account
Arthur Conmy
Scholar

Arthur Conmy

Google Scholar ID: n4HIyXQAAAAJ
Google DeepMind
AGI SafetyAI SafetyInterpretabilityMechanistic InterpretabilityMachine Learning
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
2,569
 
H-index
18
 
i10-index
20
 
Publications
20
 
Co-authors
20
list available
Contact
No contact links provided.
Publications
16 items
How do LLMs Compute Verbal Confidence
2026
Cited
0
Automatically Finding Reward Model Biases
2026
Cited
0
Simple LLM Baselines are Competitive for Model Diffing
2026
Cited
0
Fluid Representations in Reasoning Models
2026
Cited
0
Building Production-Ready Probes For Gemini
2026
Cited
1
Base Models Know How to Reason, Thinking Models Learn When
2025
Cited
0
Eliciting Secret Knowledge from Language Models
2025
Cited
0
Thought Anchors: Which LLM Reasoning Steps Matter?
2025
Cited
0
Resume (English only)
Co-authors
20 total
Neel Nanda
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Senthooran Rajamanoharan
Senthooran Rajamanoharan
Google DeepMind
Co-author 3
Co-author 3
Aengus Lynch
Aengus Lynch
University College London
Co-author 5
Co-author 5
Rohin Shah
Rohin Shah
Research Scientist, Google DeepMind
Jacob Steinhardt
Jacob Steinhardt
Stanford University
Rowan Wang
Rowan Wang
Unknown affiliation

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?