Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the implicit image quality degradation that accumulates during multi-round image editing, a phenomenon poorly detected by existing no-reference image quality assessment (NR-IQA) methods. To investigate this issue, the authors construct Banana100, a dataset comprising 28,000 images generated through 100 iterative editing rounds using multimodal agents, enabling systematic analysis of quality decay patterns. The study reveals, for the first time, the widespread failure of NR-IQA metrics in iteratively generated scenarios. Experimental results demonstrate that 21 state-of-the-art NR-IQA methods consistently fail to identify severely degraded images, highlighting a dual vulnerability in both generative and evaluative systems. The authors release all code and data to establish a new benchmark for robust image generation and quality assessment.

Technology Category

Application Category

📝 Abstract

The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.

Problem

Research questions and friction points this paper is trying to address.

iterative image editing

image quality degradation

no-reference image quality assessment

multi-modal agentic systems

artifact accumulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

iterative image degradation

multi-turn editing

no-reference image quality assessment