Several papers accepted and published, including: 'Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning' (NeurIPS 2025), 'Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers' (ACL 2025), 'Narrow Finetuning Leaves Clearly Readable Traces in the Activation Differences' (NeurIPS Mech Interp workshop 2025); developed tools like nnterp and tiny-dashboard.
Research Experience
Currently working with Julian Minder on evaluating different model diffing methods; previously focused on interpretability during an internship at EPFL DLAB, following up on the 'Do Llamas work in English?' paper; explored the emergence of XOR features in Large Language Models and the RAX hypothesis proposed by Sam Marks, did SPAR with Walter Laurito, and worked on non-maximizing training objectives for RL agents with Jobst Heitzig.
Education
Completing MSc in Vision & Learning (MVA) at École Normale Supérieure Paris-Saclay. Previously, completed a research internship at EPFL DLAB under the supervision of Chris Wendler and Bob West.
Background
MATS Winter 2025 (7.0) Scholar with Neel Nanda. Main research interest is technical AI alignment.
Miscellany
Interested in evolutionary biology and its manifestation in artificial life simulations; improvisor at the ENS improv theater club Lika.