Selected publications: 'To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models' (ICML 2025), 'Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions' (TMLR 2025), 'CoSy: Evaluating Textual Explanations of Neurons' (NeurIPS 2024), 'Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond' (NeurIPS Workshop 2024), 'Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond' (JMLR 2023). Full list available on Google Scholar.
Research Experience
Currently a Postdoctoral Fellow at ETH Zürich, supervised by Prof. Dr. Menna El-Assady (IVIA Lab) and Prof. Dr. Andreas Krause (LAS group). Previously, she held multiple ML roles across industry, most recently at J.P. Morgan working on mechanistic steering of LLMs. Before her Ph.D., she freelanced in ML, worked with credit risk at Klarna, time-series modeling at Bosch, and interned at Black Swan Data and BCG. She likes to advise and support startups on AI/ML and contribute to open-source software such as Quantus.
Education
Ph.D. in Machine Learning from TU Berlin, advised by Prof. Dr. Marina Höhne and Prof. Dr. Wojciech Samek; M.Sc. from KTH; B.Sc. from UCL.
Background
Her research aims to advance AI safety through exploring the intersection of evaluation-centric interpretability and alignment of large language models (LLMs). She is interested in developing principled methods that turn mechanistic insights of model internals into signals for steering and post-training control.
Miscellany
Currently based in Zürich, Switzerland. Email: hedstroem.anna@gmail.com