Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Sparse autoencoders (SAEs) in LLM interpretability research suffer from high computational overhead due to high-dimensional hidden layers, insufficient expert specialization in Mixture-of-Experts (MoE) architectures, and severe feature redundancy. This paper proposes a multi-expert collaborative activation mechanism coupled with adaptive high-frequency feature scaling. Leveraging semantic-weighted gating, the method enables fine-grained routing across experts while dynamically suppressing redundant high-frequency features, thereby significantly enhancing functional decoupling among experts. Experiments demonstrate that, compared to state-of-the-art SAE-MoE approaches, our method reduces reconstruction error by 24% and feature redundancy by 99%. It achieves strong interpretability without sacrificing inference efficiency—marking the first approach to realize sparse feature learning with high diversity, low overlap, and expert-level specialization.

Technology Category

Application Category

📝 Abstract

Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models (LLMs) by decomposing token activations into combinations of human-understandable features. While SAEs provide crucial insights into LLM explanations, their practical adoption faces a fundamental challenge: better interpretability demands that SAEs'hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs. Recent Mixture of Experts (MoE) approaches attempt to address this by partitioning SAEs into narrower expert networks with gated activation, thereby reducing computation. In a well-designed MoE, each expert should focus on learning a distinct set of features. However, we identify a extit{critical limitation} in MoE-SAE: Experts often fail to specialize, which means they frequently learn overlapping or identical features. To deal with it, we propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling. Experiments demonstrate a 24% lower reconstruction error and a 99% reduction in feature redundancy compared to existing MoE-SAE methods. This work bridges the interpretability-efficiency gap in LLM analysis, allowing transparent model inspection without compromising computational feasibility.

Problem

Research questions and friction points this paper is trying to address.

SAEs face high computational costs from large hidden layers

MoE-SAE experts fail to specialize and learn overlapping features

Current methods struggle with interpretability-efficiency trade-off in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiple Expert Activation encourages expert specialization

Feature Scaling enhances diversity through adaptive scaling

Reduces reconstruction error and feature redundancy significantly

🔎 Similar Papers

Unity by Diversity: Improved Representation Learning in Multimodal VAEs

2024-03-08arXiv.orgCitations: 1

More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing

2024-10-10arXiv.orgCitations: 0

Authors to Follow