A Preliminary Study for GPT-4o on Image Restoration

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work presents the first systematic evaluation of GPT-4o on image restoration tasks—including dehazing, deraining, and low-light enhancement—revealing that while its outputs are visually plausible, they suffer from structural distortions (e.g., geometric scaling artifacts, object misalignment, and viewpoint shifts), rendering them unsuitable as direct replacements for conventional methods. Method: We propose a novel paradigm—“Large Language Models’ Outputs as Visual Priors”—leveraging GPT-4o’s multimodal generations as weak supervision signals to jointly fine-tune classical CNN-based restoration networks (e.g., DehazeNet), thereby eliminating reliance on pixel-level ground-truth annotations. Contribution/Results: Our approach yields significant improvements in PSNR and SSIM across multiple benchmarks. To foster reproducibility and benchmarking, we release the first large-model–generated image restoration benchmark, covering 10+ mainstream datasets, establishing the first open, reproducible baseline and integrated framework for LLM-augmented image restoration.

Technology Category

Application Category

📝 Abstract

OpenAI's GPT-4o model, integrating multi-modal inputs and outputs within an autoregressive architecture, has demonstrated unprecedented performance in image generation. In this work, we investigate its potential impact on the image restoration community. We present the first systematic evaluation of GPT-4o across diverse restoration tasks. Our experiments reveal that, although restoration outputs from GPT-4o are visually appealing, they often suffer from pixel-level structural fidelity when compared to ground-truth images. Common issues are variations in image proportions, shifts in object positions and quantities, and changes in viewpoint.To address it, taking image dehazing, derainning, and low-light enhancement as representative case studies, we show that GPT-4o's outputs can serve as powerful visual priors, substantially enhancing the performance of existing dehazing networks. It offers practical guidelines and a baseline framework to facilitate the integration of GPT-4o into future image restoration pipelines. We hope the study on GPT-4o image restoration will accelerate innovation in the broader field of image generation areas. To support further research, we will release GPT-4o-restored images from over 10 widely used image restoration datasets.

Problem

Research questions and friction points this paper is trying to address.

Evaluating GPT-4o's performance in diverse image restoration tasks

Addressing pixel-level structural fidelity issues in GPT-4o's outputs

Enhancing existing dehazing networks using GPT-4o's visual priors

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4o integrates multi-modal autoregressive architecture

GPT-4o outputs serve as powerful visual priors

Framework integrates GPT-4o into image restoration pipelines

🔎 Similar Papers

Have Large Vision-Language Models Mastered Art History?