Distribution-Aware Feature Selection for SAEs

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Top-K sparse autoencoders (SAEs) suffer from inefficiency under token-wise heterogeneity in information content, while BatchTopK exacerbates the “activation lottery” problem—where high-magnitude but rare features suppress low-magnitude yet semantically rich ones. To address this, we propose Sampled-SAE. Our method introduces a distribution-aware candidate feature pool, generalizes BatchTopK into a tunable feature selection spectrum, and incorporates column-level scoring—based on either L2 norm or entropy—to explicitly model feature distributions. This enables joint optimization of intra-batch global consistency and fine-grained reconstruction fidelity. Experiments on Pythia-160M demonstrate that Sampled-SAE significantly alleviates activation competition, achieving superior trade-offs among shared architectural constraints, reconstruction accuracy, and downstream task performance. The approach enhances both the robustness and interpretability of sparse representations.

Technology Category

Application Category

📝 Abstract
Sparse autoencoders (SAEs) decompose neural activations into interpretable features. A widely adopted variant, the TopK SAE, reconstructs each token from its K most active latents. However, this approach is inefficient, as some tokens carry more information than others. BatchTopK addresses this limitation by selecting top activations across a batch of tokens. This improves average reconstruction but risks an "activation lottery," where rare high-magnitude features crowd out more informative but lower-magnitude ones. To address this issue, we introduce Sampled-SAE: we score the columns (representing features) of the batch activation matrix (via $L_2$ norm or entropy), forming a candidate pool of size $Kl$, and then apply Top-$K$ to select tokens across the batch from the restricted pool of features. Varying $l$ traces a spectrum between batch-level and token-specific selection. At $l=1$, tokens draw only from $K$ globally influential features, while larger $l$ expands the pool toward standard BatchTopK and more token-specific features across the batch. Small $l$ thus enforces global consistency; large $l$ favors fine-grained reconstruction. On Pythia-160M, no single value optimizes $l$ across all metrics: the best choice depends on the trade-off between shared structure, reconstruction fidelity, and downstream performance. Sampled-SAE thus reframes BatchTopK as a tunable, distribution-aware family.
Problem

Research questions and friction points this paper is trying to address.

Improving feature selection efficiency in sparse autoencoders
Addressing activation lottery in batch-level TopK selection
Balancing global consistency and fine-grained reconstruction trade-offs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampled-SAE scores features via L2 norm or entropy
Selects top-K tokens from restricted candidate feature pool
Tunable parameter l balances global vs token-specific selection
🔎 Similar Papers
No similar papers found.