Scholar

Anna Hedström

Google Scholar ID: ldOYtBUAAAAJ

ETH Zürich

AI SafetyInterpretabilityEvaluationAlignment

Citations & Impact

All-time

Citations

547

H-index

i10-index

Publications

Co-authors

Contact

Publications

1 items

2025

Cited

Resume (English only)

Academic Achievements

Selected publications: 'To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models' (ICML 2025), 'Evaluating Interpretable Methods via Geometric Alignment of Functional Distortions' (TMLR 2025), 'CoSy: Evaluating Textual Explanations of Neurons' (NeurIPS 2024), 'Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond' (NeurIPS Workshop 2024), 'Quantus: An Explainable AI Toolkit for Responsible Evaluation of Neural Network Explanations and Beyond' (JMLR 2023). Full list available on Google Scholar.

Research Experience

Currently a Postdoctoral Fellow at ETH Zürich, supervised by Prof. Dr. Menna El-Assady (IVIA Lab) and Prof. Dr. Andreas Krause (LAS group). Previously, she held multiple ML roles across industry, most recently at J.P. Morgan working on mechanistic steering of LLMs. Before her Ph.D., she freelanced in ML, worked with credit risk at Klarna, time-series modeling at Bosch, and interned at Black Swan Data and BCG. She likes to advise and support startups on AI/ML and contribute to open-source software such as Quantus.

Education

Ph.D. in Machine Learning from TU Berlin, advised by Prof. Dr. Marina Höhne and Prof. Dr. Wojciech Samek; M.Sc. from KTH; B.Sc. from UCL.

Background

Her research aims to advance AI safety through exploring the intersection of evaluation-centric interpretability and alignment of large language models (LLMs). She is interested in developing principled methods that turn mechanistic insights of model internals into signals for steering and post-training control.

Miscellany

Co-authors

0 total

Co-authors: 0 (list not available)