Scholar
Neel Nanda
Google Scholar ID: GLnX3MkAAAAJ
Mechanistic Interpretability Team Lead, Google DeepMind
AI
ML
AI Alignment
Interpretability
Mechanistic Interpretability
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
8,941
H-index
32
i10-index
45
Publications
20
Co-authors
9
list available
Contact
Email
neelnanda27@gmail.com
Twitter
Open ↗
Publications
35 items
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
2026
Cited
0
Simple LLM Baselines are Competitive for Model Diffing
2026
Cited
0
Emergent Misalignment is Easy, Narrow Misalignment is Hard
2026
Cited
0
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
2026
Cited
0
Building Production-Ready Probes For Gemini
2026
Cited
1
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
2025
Cited
0
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
2025
Cited
0
Difficulties with Evaluating a Deception Detector for AIs
2025
Cited
0
Load more
Resume (English only)
Co-authors
9 total
Arthur Conmy
Google DeepMind
Senthooran Rajamanoharan
Google DeepMind
Co-author 3
Wes Gurnee
Anthropic
Co-author 5
Catherine Olsson
Anthropic
Lawrence Chan
PhD Student, UC Berkeley
Bilal Chughtai
Google DeepMind
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up