TIR-Agent: Training an Explorative and Efficient Agent for Image Restoration

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image restoration agents rely on untrained heuristic scheduling and exhaustive tool invocation, resulting in suboptimal restoration pathways and high computational overhead. This work proposes TIR-Agent, a trainable image restoration agent that learns efficient task scheduling and tool composition strategies through a two-stage training paradigm combining supervised fine-tuning and reinforcement learning. The approach innovatively incorporates stochastic perturbations to enhance exploration and introduces a multidimensional adaptive reward mechanism that dynamically fuses image quality metrics to mitigate reward hacking. Experimental results demonstrate that TIR-Agent outperforms twelve baseline models across both in-domain and out-of-domain degradation scenarios, achieving over 2.5× faster inference and significantly reducing redundant tool calls.
📝 Abstract
Vision-language agents that orchestrate specialized tools for image restoration (IR) have emerged as a promising method, yet most existing frameworks operate in a training-free manner. They rely on heuristic task scheduling and exhaustive tool traversal, resulting in sub-optimal restoration paths and prohibitive computational cost. We argue that the core bottleneck lies in the absence of a learned policy to make decision, as a vision-language model cannot efficiently handle degradation-aware task ordering and tool composition. To this end, we propose TIR-Agent, a trainable image restoration agent that performs a direct tool-calling policy through a two-stage training pipeline of supervised fine-tuning (SFT) followed by reinforcement learning (RL). Two key designs underpin effective RL training: (i) a random perturbation strategy applied to the SFT data, which broadens the policy's exploration over task schedules and tool compositions, and (ii) a multi-dimensional adaptive reward mechanism that dynamically re-weights heterogeneous image quality metrics to mitigate reward hacking. To support high-throughput, asynchronous GPU-based tool invocation during training, we further develop a globally shared model-call pool. Experiments on both in-domain and out-of-domain degradations show that TIR-Agent outperforms 12 baselines, including 6 all-in-one models, 3 training-free agents, and 3 proprietary models, and achieves over 2.5$\times$ inference speedup by eliminating redundant tool executions.
Problem

Research questions and friction points this paper is trying to address.

image restoration
vision-language agent
tool composition
task scheduling
learned policy
Innovation

Methods, ideas, or system contributions that make the work stand out.

trainable agent
tool-calling policy
reinforcement learning
image restoration
adaptive reward
🔎 Similar Papers
No similar papers found.