Clément Dumas
Scholar

Clément Dumas

Google Scholar ID: X3IvIvMAAAAJ
ENS Paris-Saclay
AI safetyDeep LearningNLPAI
Citations & Impact
All-time
Citations
25
 
H-index
2
 
i10-index
1
 
Publications
5
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Several papers accepted and published, including: 'Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning' (NeurIPS 2025), 'Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers' (ACL 2025), 'Narrow Finetuning Leaves Clearly Readable Traces in the Activation Differences' (NeurIPS Mech Interp workshop 2025); developed tools like nnterp and tiny-dashboard.
Research Experience
  • Currently working with Julian Minder on evaluating different model diffing methods; previously focused on interpretability during an internship at EPFL DLAB, following up on the 'Do Llamas work in English?' paper; explored the emergence of XOR features in Large Language Models and the RAX hypothesis proposed by Sam Marks, did SPAR with Walter Laurito, and worked on non-maximizing training objectives for RL agents with Jobst Heitzig.
Education
  • Completing MSc in Vision & Learning (MVA) at École Normale Supérieure Paris-Saclay. Previously, completed a research internship at EPFL DLAB under the supervision of Chris Wendler and Bob West.
Background
  • MATS Winter 2025 (7.0) Scholar with Neel Nanda. Main research interest is technical AI alignment.
Miscellany
  • Interested in evolutionary biology and its manifestation in artificial life simulations; improvisor at the ENS improv theater club Lika.