FineRMoE: Dimension Expansion for Finer-Grained Expert with Its Upcycling Approach

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the performance saturation of existing single-dimension fine-grained experts beyond a certain intermediate dimension threshold, which limits further gains. To overcome this limitation, the paper introduces the first dual-dimensional fine-grained expert architecture that spans both intermediate and output dimensions. It proposes a two-level sparse feedforward computation scheme and a dedicated routing mechanism to enhance expert specialization, alongside an efficient model upcycling strategy that enables low-cost scaling. Evaluated across ten standard benchmarks, the approach substantially outperforms the strongest baselines, achieving a 6× improvement in parameter efficiency, a 281× reduction in prefill latency, and a 136× increase in decoding throughput.

Technology Category

Application Category

📝 Abstract

As revealed by the scaling law of fine-grained MoE, model performance ceases to be improved once the granularity of the intermediate dimension exceeds the optimal threshold, limiting further gains from single-dimension fine-grained design. To address this bottleneck, we propose FineRMoE (FineR-Grained MoE), an architecture that extends fine-grained expert design to both intermediate and output dimensions, aiming to enhance expert specialization beyond the single-dimension limit. We further introduce a bi-level sparse forward computation paradigm and a specialized routing mechanism to govern the activation. In addition, to obviate the prohibitive cost of training FineRMoE from scratch, we devise a generalized upcycling method to build FineRMoE in a cost-effective manner. Extensive experiments demonstrate the superior performance achieved by FineRMoE across ten standard benchmarks. Compared with the strongest baseline, FineRMoE achieves 6 times higher parameter efficiency, 281 times lower prefill latency, and 136 timese higher decoding throughput during inference.

Problem

Research questions and friction points this paper is trying to address.

fine-grained MoE

expert specialization

dimension expansion

scaling law

parameter efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained MoE

Dimension Expansion

Bi-level Sparse Computation