Scholar

Alexey Dontsov

Google Scholar ID: 2SK4CMIAAAAJ

HSE, AI Interpretability Lab

unlearningmechanistic interpretation

Google Scholar↗

Citations & Impact

All-time

Citations

24

H-index

2

i10-index

2

Publications

5

Co-authors

7

list available

Contact

No contact links provided.

Publications

6 items

Sanity Checks for Sparse Autoencoders: Do SAEs Beat Random Baselines?

2026

Cited

0

The Rogue Scalpel: Activation Steering Compromises LLM Safety

2025

Cited

0

OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features

2025

Cited

0

Lightweight error mitigation strategies for post-training N:M activation sparsity in LLMs

2025

Cited

0

I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders

2025

Cited

0

CLEAR: Character Unlearning in Textual and Visual Modalities

arXiv.org · 2024

Cited

0

Resume (English only)

Co-authors

7 total

Elena Tutubalina

AIRI; Skolkovo Institute of Science and Technology

University of Sharjah, MTUCI

Anton Razzhigaev

Independent researcher

Andrey Galichin

Anton Korznikov

Independent researcher