HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

254K/year
🤖 AI Summary
Existing learning-free compression methods, relying solely on pairwise expert compatibility, struggle to identify higher-order cyclic obstructions—specifically irreducible triplets—that hinder effective merging in sparse Mixture-of-Experts (MoE) models, thereby limiting compression performance. This work addresses this limitation by modeling experts as vertices of a 2-dimensional simplicial complex and constructing edge- and triangle-based merging obstruction signals using KL divergence. Leveraging Hodge decomposition, the method extracts harmonic kernels to guide a greedy compression strategy that explicitly preserves critical high-order structures. By directly modeling and avoiding irreducible triplet obstructions, the approach achieves aggressive expert reduction on three open-source sparse MoE backbones, attaining state-of-the-art performance among learning-free compression techniques while maintaining balanced model quality across all four Hodge components.
📝 Abstract
Sparse Mixture-of-Experts (MoE) layers route tokens through a handful of experts, and learning-free compression of these layers reduces inference cost without retraining. A subtle obstruction blocks every existing compressor in this family: three experts can each be pairwise compatible yet form an irreducible cycle when merged together, so any score that ranks experts on pairwise signals is structurally blind to which triples are jointly mergeable. We show the obstruction is a precise mathematical object, the harmonic kernel of the simplicial Laplacian on a 2-complex whose vertices are experts, whose edges carry KL merge barriers, and whose faces carry triplet barriers; Hodge-decomposing the edge-barrier signal isolates the kernel exactly. We turn the diagnostic into a selection objective: HodgeCover greedily covers the harmonic-critical edges and triplet-critical triangles, and a hybrid variant of HodgeCover pairs it with off-the-shelf weight pruning on survivors. On three open-weight Sparse MoE backbones under aggressive expert reduction, HodgeCover matches state-of-the-art learning-free baselines on the expert-reduction axis, leads on the aggressive-compression frontier of the hybrid axis, and uniquely balances retained mass across all four Hodge components. These results show that exposing the harmonic kernel of a learned MoE structure changes which compressor wins at the regime that matters most.
Problem

Research questions and friction points this paper is trying to address.

Sparse Mixture-of-Experts
model compression
harmonic kernel
simplicial Laplacian
expert merging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hodge decomposition
Mixture-of-Experts
topological coverage
simplicial Laplacian
learning-free compression