AgoraResearch hub
ExploreLibraryProfile
Account
Neel Nanda
Scholar

Neel Nanda

Google Scholar ID: GLnX3MkAAAAJ
Mechanistic Interpretability Team Lead, Google DeepMind
AIMLAI AlignmentInterpretabilityMechanistic Interpretability
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
8,941
 
H-index
32
 
i10-index
45
 
Publications
20
 
Co-authors
9
list available
Contact
Emailneelnanda27@gmail.comTwitterOpen ↗
Publications
35 items
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
2026
Cited
0
Simple LLM Baselines are Competitive for Model Diffing
2026
Cited
0
Emergent Misalignment is Easy, Narrow Misalignment is Hard
2026
Cited
0
What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering
2026
Cited
0
Building Production-Ready Probes For Gemini
2026
Cited
1
Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit
2025
Cited
0
Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval
2025
Cited
0
Difficulties with Evaluating a Deception Detector for AIs
2025
Cited
0
Resume (English only)
Co-authors
9 total
Arthur Conmy
Arthur Conmy
Google DeepMind
Senthooran Rajamanoharan
Senthooran Rajamanoharan
Google DeepMind
Co-author 3
Co-author 3
Wes Gurnee
Wes Gurnee
Anthropic
Co-author 5
Co-author 5
Catherine Olsson
Catherine Olsson
Anthropic
Lawrence Chan
Lawrence Chan
PhD Student, UC Berkeley
Bilal Chughtai
Bilal Chughtai
Google DeepMind

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?