Scholar
Nina Panickssery
Google Scholar ID: 6-_i-jsAAAAJ
Anthropic
Language Models
AI Alignment
AI Interpretability
ML Safety
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
862
H-index
5
i10-index
4
Publications
7
Co-authors
12
list available
Contact
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
3 items
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
2025
Cited
0
Mitigating Many-Shot Jailbreaking
2025
Cited
0
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
arXiv.org · 2024
Cited
0
Resume (English only)
Co-authors
12 total
Meg Tong
Anthropic
Evan Hubinger
Member of Technical Staff, Anthropic
Julian Schulz
University of Göttingen
Andy Arditi
Northeastern University
Co-author 5
Co-author 6
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Wes Gurnee
Anthropic
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up