DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scalability and disentanglement challenges of dictionary learning in high-dimensional Transformer embedding spaces for mechanistic interpretability of large language models, this paper proposes Double-Batch KSVD—a novel adaptation of the classical KSVD algorithm to million-scale datasets with thousand-dimensional features. Built upon an alternating optimization framework, the method integrates batched sampling, low-rank approximation, and sparse coding updates, and is implemented efficiently in Julia for high-performance numerical computation. Applied to Gemma-2-2B embeddings, it achieves large-scale disentanglement and matches state-of-the-art sparse autoencoders (SAEs) across all six metrics of SAEBench. The core contribution is overcoming KSVD’s scalability bottleneck, establishing a new paradigm for structured analysis of high-dimensional embeddings grounded in rigorous optimization theory and empirical efficacy at scale.

Technology Category

Application Category

📝 Abstract
Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings, however, requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the sparse encoding subproblem, which is known to be NP-hard. It is therefore interesting to understand whether this structure is sufficient to find good solutions to the dictionary learning problem or if a more sophisticated algorithm could find better solutions. In this work, we propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm. DB-KSVD is informed by the rich theoretical foundations of KSVD but scales to datasets with millions of samples and thousands of dimensions. We demonstrate the efficacy of DB-KSVD by disentangling embeddings of the Gemma-2-2B model and evaluating on six metrics from the SAEBench benchmark, where we achieve competitive results when compared to established approaches based on SAEs. By matching SAE performance with an entirely different optimization approach, our results suggest that (i) SAEs do find strong solutions to the dictionary learning problem and (ii) that traditional optimization approaches can be scaled to the required problem sizes, offering a promising avenue for further research. We provide an implementation of DB-KSVD at https://github.com/RomeoV/KSVD.jl.
Problem

Research questions and friction points this paper is trying to address.

Scalable disentangling of high-dimensional transformer embeddings
Evaluating dictionary learning alternatives to sparse autoencoders
Comparing DB-KSVD performance with SAEs on mechanistic interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable Double-Batch KSVD algorithm
Adapts classic KSVD for high dimensions
Competes with sparse autoencoders performance
🔎 Similar Papers
No similar papers found.