AgoraResearch hub
ExploreLibraryProfile
Account
Nina Panickssery
Scholar

Nina Panickssery

Google Scholar ID: 6-_i-jsAAAAJ
Anthropic
Language ModelsAI AlignmentAI InterpretabilityML Safety
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
862
 
H-index
5
 
i10-index
4
 
Publications
7
 
Co-authors
12
list available
Contact
TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗
Publications
3 items
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs
2025
Cited
0
Mitigating Many-Shot Jailbreaking
2025
Cited
0
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
arXiv.org · 2024
Cited
0
Resume (English only)
Co-authors
12 total
Meg Tong
Meg Tong
Anthropic
Evan Hubinger
Evan Hubinger
Member of Technical Staff, Anthropic
Julian Schulz
Julian Schulz
University of Göttingen
Andy Arditi
Andy Arditi
Northeastern University
Co-author 5
Co-author 5
Co-author 6
Co-author 6
Neel Nanda
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Wes Gurnee
Wes Gurnee
Anthropic

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?