🤖 AI Summary
This work addresses the challenge of uniformly modeling diverse and unknown composite image degradations, which existing methods struggle to handle effectively. To this end, we propose a multimodal large language model (MLLM)-guided mixture-of-experts mechanism operating in the frequency domain. Our approach leverages semantic embeddings generated by an MLLM to enrich degradation-aware representations and introduces a frequency-domain mixture-of-experts module that adaptively fuses frequency experts based on contextual cues. Furthermore, we devise a relation-aligned routing strategy coupled with a dedicated loss function to explicitly capture the continuous structural relationships among different degradations. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, with our method achieving a notable gain of up to 1.35 dB on the CDD11 dataset.
📝 Abstract
All-in-one image restoration seeks to recover clean images from inputs affected by diverse and unknown degradations using a unified framework. Recent methods have shown strong performance by identifying degradation characteristics to guide the restoration process. However, many of them treat degradations as discrete categories, which limits their ability to model the continuous relational structure that arises in composite degradations. To address this issue, we propose a multimodal large language model (MLLM)-guided image restoration framework that exploits multimodal embeddings as guidance for low-level restoration. Specifically, MLLM-derived features are injected into an encoder-decoder architecture through an MLLM-guided fusion block (MGFB) to enhance degradation-aware representations. In addition, we incorporate a mixture-of-frequency-experts (MoFE) module that adaptively combines frequency experts using MLLM-guided contextual cues. To further improve expert routing, we design an MLLM-guided router with a relational alignment loss that encourages routing patterns consistent with the embedding-space relationships of degraded inputs. Extensive experiments on multiple benchmarks show that the proposed method achieves strong performance across diverse restoration settings and establishes a new state of the art on the challenging CDD11 dataset, outperforming previous methods by up to 1.35 dB.