Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

Current text-guided image editing research is hindered by the scarcity of large-scale, high-quality, publicly available real-image editing datasets. To address this, we introduce Pico-Banana-400K—a dataset comprising 400K instruction-driven editing samples synthesized from authentic photographs, covering complex tasks including multi-turn editing, preference alignment, and instruction rewriting. We propose a systematic quality control framework and a fine-grained taxonomy of editing types, and curate three specialized subsets—multi-turn editing, preference comparison, and short/long instruction pairs—to enhance diversity and research utility. Samples are generated using the Nano-Banana model and rigorously filtered via multimodal large language models (MLLMs) for automated scoring and selection. Empirical evaluation demonstrates substantial improvements in content preservation and instruction fidelity. Pico-Banana-400K sets a new state-of-the-art in scale, photorealism, and task coverage, establishing a scalable benchmark for training and evaluating next-generation image editing models.

Technology Category

Application Category

📝 Abstract

Recent advances in multimodal models have demonstrated remarkable text-guided image editing capabilities, with systems like GPT-4o and Nano-Banana setting new benchmarks. However, the research community's progress remains constrained by the absence of large-scale, high-quality, and openly accessible datasets built from real images. We introduce Pico-Banana-400K, a comprehensive 400K-image dataset for instruction-based image editing. Our dataset is constructed by leveraging Nano-Banana to generate diverse edit pairs from real photographs in the OpenImages collection. What distinguishes Pico-Banana-400K from previous synthetic datasets is our systematic approach to quality and diversity. We employ a fine-grained image editing taxonomy to ensure comprehensive coverage of edit types while maintaining precise content preservation and instruction faithfulness through MLLM-based quality scoring and careful curation. Beyond single turn editing, Pico-Banana-400K enables research into complex editing scenarios. The dataset includes three specialized subsets: (1) a 72K-example multi-turn collection for studying sequential editing, reasoning, and planning across consecutive modifications; (2) a 56K-example preference subset for alignment research and reward model training; and (3) paired long-short editing instructions for developing instruction rewriting and summarization capabilities. By providing this large-scale, high-quality, and task-rich resource, Pico-Banana-400K establishes a robust foundation for training and benchmarking the next generation of text-guided image editing models.

Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of large-scale open datasets for text-guided image editing

Providing diverse edit types with quality control and content preservation

Enabling research on complex editing scenarios like sequential modifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging Nano-Banana to generate edit pairs from real images

Using fine-grained taxonomy and MLLM scoring for quality control

Providing specialized subsets for multi-turn and preference research

🔎 Similar Papers

No similar papers found.