Scholar
Andy Arditi
Google Scholar ID: NgyIgX4AAAAJ
Northeastern University
Interpretability
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
367
H-index
6
i10-index
5
Publications
10
Co-authors
7
list available
Contact
Email
andyrdt@gmail.com
Twitter
Open ↗
Publications
5 items
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
2025
Cited
0
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
2025
Cited
0
Inverse Scaling in Test-Time Compute
2025
Cited
0
Adversarial Manipulation of Reasoning Models using Internal Representations
2025
Cited
0
Teaching Models to Verbalize Reward Hacking in Chain-of-Thought Reasoning
2025
Cited
0
Resume (English only)
Background
Research Interest: AI interpretability
Miscellany
Contact Information: andyrdt@gmail.com, andyarditi
Co-authors
7 total
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Co-author 2
Wes Gurnee
Anthropic
Nina Panickssery
Anthropic
Daniel Paleka
ETH Zurich
Runjin Chen
PHD student at UT Austin
Jack Lindsey
Anthropic
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up