AgoraResearch hub
ExploreLibraryProfile
Account
Stefan Heimersheim
Scholar

Stefan Heimersheim

Google Scholar ID: PX37V5AAAAAJ
Apollo Research
Google Scholar↗
Citations & Impact
All-time
Citations
1,124
 
H-index
12
 
i10-index
13
 
Publications
20
 
Co-authors
4
list available
Contact
No contact links provided.
Publications
8 items
The Obfuscation Atlas: Mapping Where Honesty Emerges in RLVR with Deception Probes
2026
Cited
0
Concept Influence: Leveraging Interpretability to Improve Performance and Efficiency in Training Data Attribution
2026
Cited
0
SCALAR: Benchmarking SAE Interaction Sparsity in Toy LLMs
2025
Cited
0
Benchmarking Deception Probes via Black-to-White Performance Boosts
2025
Cited
0
Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability
2025
Cited
0
Detecting Strategic Deception Using Linear Probes
2025
Cited
0
Open Problems in Mechanistic Interpretability
2025
Cited
0
Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
2025
Cited
0
Resume (English only)
Co-authors
4 total
Aengus Lynch
Aengus Lynch
University College London
Adrià Garriga-Alonso
Adrià Garriga-Alonso
Research Scientist, FAR AI
Co-author 3
Co-author 3
Co-author 4
Co-author 4

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?