TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration

📅 2026-03-29

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing image restoration agents rely on untrained heuristic scheduling and exhaustive tool invocation, resulting in suboptimal restoration pathways and high computational overhead. This work proposes TIR-Agent, a trainable image restoration agent that learns efficient task scheduling and tool composition strategies through a two-stage training paradigm combining supervised fine-tuning and reinforcement learning. The approach innovatively incorporates stochastic perturbations to enhance exploration and introduces a multidimensional adaptive reward mechanism that dynamically fuses image quality metrics to mitigate reward hacking. Experimental results demonstrate that TIR-Agent outperforms twelve baseline models across both in-domain and out-of-domain degradation scenarios, achieving over 2.5× faster inference and significantly reducing redundant tool calls.

Technology Category

Application Category

📝 Abstract

Vision-language agents that orchestrate specialized tools for image restoration (IR) have emerged as a promising method, yet most existing frameworks operate in a training-free manner. They rely on heuristic task scheduling and exhaustive tool traversal, resulting in sub-optimal restoration paths and prohibitive computational cost. We argue that the core bottleneck lies in the absence of a learned policy to make decision, as a vision-language model cannot efficiently handle degradation-aware task ordering and tool composition. To this end, we propose TIR-Agent, a trainable image restoration agent that performs a direct tool-calling policy through a two-stage training pipeline of supervised fine-tuning (SFT) followed by reinforcement learning (RL). Two key designs underpin effective RL training: (i) a random perturbation strategy applied to the SFT data, which broadens the policy's exploration over task schedules and tool compositions, and (ii) a multi-dimensional adaptive reward mechanism that dynamically re-weights heterogeneous image quality metrics to mitigate reward hacking. To support high-throughput, asynchronous GPU-based tool invocation during training, we further develop a globally shared model-call pool. Experiments on both in-domain and out-of-domain degradations show that TIR-Agent outperforms 12 baselines, including 6 all-in-one models, 3 training-free agents, and 3 proprietary models, and achieves over 2.5$\times$ inference speedup by eliminating redundant tool executions.

Problem

Research questions and friction points this paper is trying to address.

image restoration

vision-language agent

tool composition

task scheduling

learned policy

Innovation

Methods, ideas, or system contributions that make the work stand out.

trainable agent

tool-calling policy

reinforcement learning