Scholar

Arthur Conmy

Google Scholar ID: n4HIyXQAAAAJ

Google DeepMind

AGI SafetyAI SafetyInterpretabilityMechanistic InterpretabilityMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,569

H-index

18

i10-index

20

Publications

20

Co-authors

20

list available

Contact

No contact links provided.

Publications

16 items

How do LLMs Compute Verbal Confidence

2026

Cited

0

Automatically Finding Reward Model Biases

2026

Cited

0

Simple LLM Baselines are Competitive for Model Diffing

2026

Cited

0

Fluid Representations in Reasoning Models

2026

Cited

0

Building Production-Ready Probes For Gemini

2026

Cited

1

Base Models Know How to Reason, Thinking Models Learn When

2025

Cited

0

Eliciting Secret Knowledge from Language Models

2025

Cited

0

Thought Anchors: Which LLM Reasoning Steps Matter?

2025

Cited

0

Resume (English only)

Co-authors

20 total

Mechanistic Interpretability Team Lead, Google DeepMind

Senthooran Rajamanoharan

Google DeepMind

University College London

Research Scientist, Google DeepMind

Jacob Steinhardt

Stanford University

Unknown affiliation