Scholar
Alexey Dontsov
Google Scholar ID: 2SK4CMIAAAAJ
HSE, AI Interpretability Lab
unlearning
mechanistic interpretation
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
24
H-index
2
i10-index
2
Publications
5
Co-authors
7
list available
Contact
No contact links provided.
Publications
6 items
Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?
2026
Cited
0
The Rogue Scalpel: Activation Steering Compromises LLM Safety
2025
Cited
0
OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features
2025
Cited
0
Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs
2025
Cited
0
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
2025
Cited
0
CLEAR: Character Unlearning in Textual and Visual Modalities
arXiv.org · 2024
Cited
0
Resume (English only)
Co-authors
7 total
Elena Tutubalina
KFU
Ivan Oseledets
AIRI; Skolkovo Institute of Science and Technology
Oleg Y. Rogov
University of Sharjah, MTUCI
Dmitrii Korzh
MTUCI
Anton Razzhigaev
Independent researcher
Andrey Galichin
RSI Lab
Anton Korznikov
Independent researcher
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up