FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of task interference and insufficient representational capacity faced by existing parameter-efficient fine-tuning methods in multi-task settings. It introduces, for the first time, a shift from spatial-domain to spectral-domain adaptation of large language models through a frequency-aware mixture-of-experts architecture. The approach maps weights into the frequency domain via the inverse discrete Fourier transform, employs a frequency-adaptive routing strategy, and utilizes conjugate-symmetric complex-valued parameterization to fully preserve phase and magnitude information, enabling lossless reconstruction of real-valued weights. Evaluated across 28 benchmarks and diverse model architectures and scales, the method consistently outperforms current state-of-the-art techniques, achieving superior performance in both single-task and multi-task scenarios with fewer trainable parameters.
📝 Abstract
Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transform (IDFT) for frequency-aware adaptation. Specifically, FourierMoE employs a frequency-adaptive router to dispatch tokens to experts specialized in distinct frequency bands. Each expert learns a set of conjugate-symmetric complex coefficients, preserving complete phase and amplitude information while theoretically guaranteeing lossless IDFT reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks, multiple model architectures, and scales demonstrate that FourierMoE consistently outperforms competitive baselines in both single-task and multi-task settings while using significantly fewer trainable parameters. These results highlight the promise of spectral-domain expert adaptation as an effective and parameter-efficient paradigm for LLM fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

parameter-efficient fine-tuning
multi-task fine-tuning
task interference
mixture-of-experts
spectral domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

FourierMoE
spectral-domain adaptation
mixture-of-experts
parameter-efficient fine-tuning
frequency-aware routing
🔎 Similar Papers
No similar papers found.