Scholar

Neel Nanda

Google Scholar ID: GLnX3MkAAAAJ

Mechanistic Interpretability Team Lead, Google DeepMind

AIMLAI AlignmentInterpretabilityMechanistic Interpretability

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

8,941

H-index

32

i10-index

45

Publications

20

Co-authors

9

list available

Contact

Emailneelnanda27@gmail.com TwitterOpen ↗

Publications

36 items

How Well Do Models Follow Their Constitutions?

2026

Cited

0

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

2026

Cited

0

Simple LLM Baselines are Competitive for Model Diffing

2026

Cited

0

Emergent Misalignment is Easy, Narrow Misalignment is Hard

2026

Cited

0

What's the plan? Metrics for implicit planning in LLMs and their application to rhyme generation and question answering

2026

Cited

0

Building Production-Ready Probes For Gemini

2026

Cited

1

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit

2025

Cited

0

Too Late to Recall: Explaining the Two-Hop Problem in Multimodal Knowledge Retrieval

2025

Cited

0

Resume (English only)

Co-authors

9 total

Google DeepMind

Senthooran Rajamanoharan

Google DeepMind

Catherine Olsson

PhD Student, UC Berkeley

Google DeepMind