Scholar

Alessandro Stolfo

Google Scholar ID: Fx50TZQAAAAJ

ETH Zürich

NLPMachine LearningInterpretability

Citations & Impact

All-time

Citations

753

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

7 items

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Recipient of the CYD Doctoral Fellowship
NeurIPS 2025: 'Dense SAE Latents Are Features, Not Bugs' (*co-first author)
ICLR 2025: 'Improving Instruction-Following in Language Models through Activation Steering' (first author)
NeurIPS 2024: 'Confidence Regulation Neurons in Language Models' (*co-first author)
ICML 2024: 'Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?' (*co-first author)
EMNLP 2023: 'A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis' (first author)
ACL 2023: 'A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models' (*co-first author)
Attended ML Alignment & Theory Scholars (MATS) Program in Nov 2023, mentored by Neel Nanda
Gave a talk at NEC Labs EU in July 2025 on LLM steering

Co-authors

10 total