Scholar
Alessandro Stolfo
Google Scholar ID: Fx50TZQAAAAJ
ETH Zürich
NLP
Machine Learning
Interpretability
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
753
H-index
8
i10-index
8
Publications
16
Co-authors
10
list available
Contact
Email
alessandro.stolfo@inf.ethz.ch
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
7 items
Fluid Representations in Reasoning Models
2026
Cited
0
On the Emergence of Induction Heads for In-Context Learning
2025
Cited
0
Probing for Arithmetic Errors in Language Models
2025
Cited
0
Dense SAE Latents Are Features, Not Bugs
2025
Cited
0
Transferring Features Across Language Models With Model Stitching
2025
Cited
0
MIB: A Mechanistic Interpretability Benchmark
2025
Cited
0
Improving Instruction-Following in Language Models through Activation Steering
arXiv.org · 2024
Cited
11
Resume (English only)
Academic Achievements
Recipient of the CYD Doctoral Fellowship
NeurIPS 2025: 'Dense SAE Latents Are Features, Not Bugs' (*co-first author)
ICLR 2025: 'Improving Instruction-Following in Language Models through Activation Steering' (first author)
NeurIPS 2024: 'Confidence Regulation Neurons in Language Models' (*co-first author)
ICML 2024: 'Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?' (*co-first author)
EMNLP 2023: 'A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis' (first author)
ACL 2023: 'A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models' (*co-first author)
Attended ML Alignment & Theory Scholars (MATS) Program in Nov 2023, mentored by Neel Nanda
Gave a talk at NEC Labs EU in July 2025 on LLM steering
Co-authors
10 total
Mrinmaya Sachan
Assistant Professor, ETH Zürich
Yonatan Belinkov
Technion
Bernhard Schölkopf
Director, Max Planck Institute for Intelligent Systems & ELLIS Institute Tübingen; Professor at ETH
Zhijing Jin
Max Planck Institute
Neel Nanda
Mechanistic Interpretability Team Lead, Google DeepMind
Wes Gurnee
Anthropic
Eric Horvitz
Microsoft
Besmira Nushi
Microsoft Research
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up