HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Existing learning-free compression methods, relying solely on pairwise expert compatibility, struggle to identify higher-order cyclic obstructions—specifically irreducible triplets—that hinder effective merging in sparse Mixture-of-Experts (MoE) models, thereby limiting compression performance. This work addresses this limitation by modeling experts as vertices of a 2-dimensional simplicial complex and constructing edge- and triangle-based merging obstruction signals using KL divergence. Leveraging Hodge decomposition, the method extracts harmonic kernels to guide a greedy compression strategy that explicitly preserves critical high-order structures. By directly modeling and avoiding irreducible triplet obstructions, the approach achieves aggressive expert reduction on three open-source sparse MoE backbones, attaining state-of-the-art performance among learning-free compression techniques while maintaining balanced model quality across all four Hodge components.

📝 Abstract

Sparse Mixture-of-Experts (MoE) layers route tokens through a handful of experts, and learning-free compression of these layers reduces inference cost without retraining. A subtle obstruction blocks every existing compressor in this family: three experts can each be pairwise compatible yet form an irreducible cycle when merged together, so any score that ranks experts on pairwise signals is structurally blind to which triples are jointly mergeable. We show the obstruction is a precise mathematical object, the harmonic kernel of the simplicial Laplacian on a 2-complex whose vertices are experts, whose edges carry KL merge barriers, and whose faces carry triplet barriers; Hodge-decomposing the edge-barrier signal isolates the kernel exactly. We turn the diagnostic into a selection objective: HodgeCover greedily covers the harmonic-critical edges and triplet-critical triangles, and a hybrid variant of HodgeCover pairs it with off-the-shelf weight pruning on survivors. On three open-weight Sparse MoE backbones under aggressive expert reduction, HodgeCover matches state-of-the-art learning-free baselines on the expert-reduction axis, leads on the aggressive-compression frontier of the hybrid axis, and uniquely balances retained mass across all four Hodge components. These results show that exposing the harmonic kernel of a learned MoE structure changes which compressor wins at the regime that matters most.

Problem

Research questions and friction points this paper is trying to address.

Sparse Mixture-of-Experts

model compression

harmonic kernel

simplicial Laplacian

expert merging

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hodge decomposition

Mixture-of-Experts

topological coverage