Banana100: Breaking NR-IQA Metrics by 100 Iterative Image Replications with Nano Banana Pro

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the implicit image quality degradation that accumulates during multi-round image editing, a phenomenon poorly detected by existing no-reference image quality assessment (NR-IQA) methods. To investigate this issue, the authors construct Banana100, a dataset comprising 28,000 images generated through 100 iterative editing rounds using multimodal agents, enabling systematic analysis of quality decay patterns. The study reveals, for the first time, the widespread failure of NR-IQA metrics in iteratively generated scenarios. Experimental results demonstrate that 21 state-of-the-art NR-IQA methods consistently fail to identify severely degraded images, highlighting a dual vulnerability in both generative and evaluative systems. The authors release all code and data to establish a new benchmark for robust image generation and quality assessment.
📝 Abstract
The multi-step, iterative image editing capabilities of multi-modal agentic systems have transformed digital content creation. Although latest image editing models faithfully follow instructions and generate high-quality images in single-turn edits, we identify a critical weakness in multi-turn editing, which is the iterative degradation of image quality. As images are repeatedly edited, minor artifacts accumulate, rapidly leading to a severe accumulation of visible noise and a failure to follow simple editing instructions. To systematically study these failures, we introduce Banana100, a comprehensive dataset of 28,000 degraded images generated through 100 iterative editing steps, including diverse textures and image content. Alarmingly, image quality evaluators fail to detect the degradation. Among 21 popular no-reference image quality assessment (NR-IQA) metrics, none of them consistently assign lower scores to heavily degraded images than to clean ones. The dual failures of generators and evaluators may threaten the stability of future model training and the safety of deployed agentic systems, if the low-quality synthetic data generated by multi-turn edits escape quality filters. We release the full code and data to facilitate the development of more robust models, helping to mitigate the fragility of multi-modal agentic systems.
Problem

Research questions and friction points this paper is trying to address.

iterative image editing
image quality degradation
no-reference image quality assessment
multi-modal agentic systems
artifact accumulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

iterative image degradation
multi-turn editing
no-reference image quality assessment
Banana100 dataset
multimodal agentic systems
🔎 Similar Papers
No similar papers found.
K
Kenan Tang
University of California, Santa Barbara
P
Praveen Arunshankar
University of California, Santa Barbara
Andong Hua
Andong Hua
University of California, Santa Barbara
Robust AI
A
Anthony Yang
University of California, Santa Barbara
Yao Qin
Yao Qin
UCSB & Google DeepMind
Machine LearningComputer VisionNatural Language Processing