Scalable Knowledge Editing for Mixture-of-Experts LLMs via Tensor-Structured Updates

📅 2026-05-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
Existing knowledge editing methods struggle to adapt to mainstream Mixture-of-Experts (MoE) large language models, lacking efficient and scalable solutions. This work presents the first effective extension of closed-form, parameter-modification-based editing to MoE architectures by introducing an editing framework that leverages the tensor structure of expert layers. The approach constructs editing targets at each expert layer and combines a MEMIT-style batching strategy with the Woodbury matrix identity, enabling edits through low-dimensional matrix inversion without instantiating full weights or performing backpropagation. While maintaining editing quality on par with strong baselines, the method achieves up to a 6× speedup, offering a highly efficient and scalable solution for knowledge editing in MoE-based models.
📝 Abstract
Knowledge editing (KE) provides a lightweight alternative to repeated fine-tuning of LLMs. However, most existing KE methods target dense feed-forward layers, while modern LLMs increasingly adopt Mixture-of-Experts (MoE) architectures for their superior memory footprint and inference efficiency. This mismatch leaves a growing class of production models without principled editing tools. We propose a MEMIT-like framework for knowledge editing in MoE-based LLMs. Our method exploits the tensor structure of MoE layers to formulate the editing objective faithfully at the per expert level, and applies the Woodbury matrix identity to avoid materializing or inverting the full stacked matrix of expert weights. The resulting update reduces to inversions of fixed low-rank matrices and requires no additional backward passes. Empirically, our approach matches the editing quality of strong baselines on the main KE metrics while accelerating the editing procedure by up to 6x, owing to the batched MEMIT-style formulation and the low-dimensional inversions enabled by the Woodbury identity. These results show that closed-form, parameter-modifying KE can be extended efficiently beyond dense layers, opening a path toward scalable knowledge editing in modern sparse LLM architectures.
Problem

Research questions and friction points this paper is trying to address.

Knowledge Editing
Mixture-of-Experts
Large Language Models
Scalability
Sparse Architectures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge Editing
Mixture-of-Experts
Tensor-Structured Updates
Woodbury Identity
Scalable LLMs
🔎 Similar Papers
2024-05-06Conference on Empirical Methods in Natural Language ProcessingCitations: 9