SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Unsupervised restoration of images degraded by multiple factors (e.g., blur, noise, rain streaks, compression artifacts) faces key bottlenecks: high computational cost from iterative search, reliance on labeled degradation types, and dependence on ground-truth reference images. Method: We propose a lightweight sequential decision-making agent that introduces the first MLLM-driven no-reference perceptual reward mechanism—capable of assessing restoration quality without degradation labels or reference images. Our approach employs a policy gradient reinforcement learning framework, where the MLLM serves as a learnable evaluator and a lightweight network generates deterministic tool invocation sequences. Contribution/Results: Under zero supervision, our method achieves state-of-the-art performance on full-reference metrics (e.g., PSNR, SSIM), surpasses all prior work on no-reference metrics (e.g., NIQE, BRISQUE), and significantly accelerates inference speed—demonstrating superior efficiency and generalizability for multi-degradation restoration.

Technology Category

Application Category

📝 Abstract

Complex image restoration aims to recover high-quality images from inputs affected by multiple degradations such as blur, noise, rain, and compression artifacts. Recent restoration agents, powered by vision-language models and large language models, offer promising restoration capabilities but suffer from significant efficiency bottlenecks due to reflection, rollback, and iterative tool searching. Moreover, their performance heavily depends on degradation recognition models that require extensive annotations for training, limiting their applicability in label-free environments. To address these limitations, we propose a policy optimization-based restoration framework that learns an lightweight agent to determine tool-calling sequences. The agent operates in a sequential decision process, selecting the most appropriate restoration operation at each step to maximize final image quality. To enable training within label-free environments, we introduce a novel reward mechanism driven by multimodal large language models, which act as human-aligned evaluator and provide perceptual feedback for policy improvement. Once trained, our agent executes a deterministic restoration plans without redundant tool invocations, significantly accelerating inference while maintaining high restoration quality. Extensive experiments show that despite using no supervision, our method matches SOTA performance on full-reference metrics and surpasses existing approaches on no-reference metrics across diverse degradation scenarios.

Problem

Research questions and friction points this paper is trying to address.

Efficient image restoration without degradation labels

Reducing computational bottlenecks in restoration agents

Enhancing performance in label-free environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Policy optimization-based lightweight agent for tool-calling sequences

MLLM-driven reward mechanism for training without annotations

Deterministic restoration plans to accelerate inference efficiently

🔎 Similar Papers

Wonderful Team: Zero-Shot Physical Task Planning with Visual LLMs