Leveraging Multimodal Large Language Models for All-in-One Image Restoration via a Mixture of Frequency Experts

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of uniformly modeling diverse and unknown composite image degradations, which existing methods struggle to handle effectively. To this end, we propose a multimodal large language model (MLLM)-guided mixture-of-experts mechanism operating in the frequency domain. Our approach leverages semantic embeddings generated by an MLLM to enrich degradation-aware representations and introduces a frequency-domain mixture-of-experts module that adaptively fuses frequency experts based on contextual cues. Furthermore, we devise a relation-aligned routing strategy coupled with a dedicated loss function to explicitly capture the continuous structural relationships among different degradations. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, with our method achieving a notable gain of up to 1.35 dB on the CDD11 dataset.

📝 Abstract

All-in-one image restoration seeks to recover clean images from inputs affected by diverse and unknown degradations using a unified framework. Recent methods have shown strong performance by identifying degradation characteristics to guide the restoration process. However, many of them treat degradations as discrete categories, which limits their ability to model the continuous relational structure that arises in composite degradations. To address this issue, we propose a multimodal large language model (MLLM)-guided image restoration framework that exploits multimodal embeddings as guidance for low-level restoration. Specifically, MLLM-derived features are injected into an encoder-decoder architecture through an MLLM-guided fusion block (MGFB) to enhance degradation-aware representations. In addition, we incorporate a mixture-of-frequency-experts (MoFE) module that adaptively combines frequency experts using MLLM-guided contextual cues. To further improve expert routing, we design an MLLM-guided router with a relational alignment loss that encourages routing patterns consistent with the embedding-space relationships of degraded inputs. Extensive experiments on multiple benchmarks show that the proposed method achieves strong performance across diverse restoration settings and establishes a new state of the art on the challenging CDD11 dataset, outperforming previous methods by up to 1.35 dB.

Problem

Research questions and friction points this paper is trying to address.

image restoration

composite degradations

continuous degradation modeling

all-in-one restoration

degradation-aware representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Large Language Model

Mixture of Frequency Experts

All-in-One Image Restoration