Alessandro Stolfo
Scholar

Alessandro Stolfo

Google Scholar ID: Fx50TZQAAAAJ
ETH Zürich
NLPMachine LearningInterpretability
Citations & Impact
All-time
Citations
753
 
H-index
8
 
i10-index
8
 
Publications
16
 
Co-authors
10
list available
Resume (English only)
Academic Achievements
  • Recipient of the CYD Doctoral Fellowship
  • NeurIPS 2025: 'Dense SAE Latents Are Features, Not Bugs' (*co-first author)
  • ICLR 2025: 'Improving Instruction-Following in Language Models through Activation Steering' (first author)
  • NeurIPS 2024: 'Confidence Regulation Neurons in Language Models' (*co-first author)
  • ICML 2024: 'Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?' (*co-first author)
  • EMNLP 2023: 'A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis' (first author)
  • ACL 2023: 'A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models' (*co-first author)
  • Attended ML Alignment & Theory Scholars (MATS) Program in Nov 2023, mentored by Neel Nanda
  • Gave a talk at NEC Labs EU in July 2025 on LLM steering