Scholar
Stefan Heimersheim
Google Scholar ID: PX37V5AAAAAJ
Apollo Research
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
1,124
H-index
12
i10-index
13
Publications
20
Co-authors
4
list available
Contact
No contact links provided.
Publications
8 items
The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
2026
Cited
0
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution
2026
Cited
0
SCALAR: Benchmarking SAE Interaction Sparsity in Toy LLMs
2025
Cited
0
Benchmarking Deception Probes via Black-to-White Performance Boosts
2025
Cited
0
Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability
2025
Cited
0
Detecting Strategic Deception Using Linear Probes
2025
Cited
0
Open Problems in Mechanistic Interpretability
2025
Cited
0
Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
2025
Cited
0
Resume (English only)
Co-authors
4 total
Aengus Lynch
University College London
Adrià Garriga-Alonso
Research Scientist, FAR AI
Co-author 3
Co-author 4
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up