ScaleEdit-12M: Scaling Open-Source Image Editing Data Generation via Multi-Agent Framework

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing image editing datasets generally suffer from limited scale, diversity, and quality, often relying on closed-source models or fixed synthesis pipelines that struggle to balance cost, generalization, and performance. To address these limitations, this work proposes ScaleEditor—the first fully open-source, hierarchical multi-agent framework that establishes an end-to-end scalable data generation pipeline. The framework integrates world-knowledge-injected source image expansion, adaptive multi-agent instruction–image synthesis, and task-aware quality validation. Leveraging this approach, we release ScaleEdit-12M, the largest open-source image editing dataset to date, encompassing 23 editing tasks. Models trained on this dataset achieve substantial performance gains across multiple benchmarks, with improvements of up to 150.0%.

Technology Category

Application Category

📝 Abstract

Instruction-based image editing has emerged as a key capability for unified multimodal models (UMMs), yet constructing large-scale, diverse, and high-quality editing datasets without costly proprietary APIs remains challenging. Previous image editing datasets either rely on closed-source models for annotation, which prevents cost-effective scaling, or employ fixed synthetic editing pipelines, which suffer from limited quality and generalizability. To address these challenges, we propose ScaleEditor, a fully open-source hierarchical multi-agent framework for end-to-end construction of large-scale, high-quality image editing datasets. Our pipeline consists of three key components: source image expansion with world-knowledge infusion, adaptive multi-agent editing instruction-image synthesis, and a task-aware data quality verification mechanism. Using ScaleEditor, we curate ScaleEdit-12M, the largest open-source image editing dataset to date, spanning 23 task families across diverse real and synthetic domains. Fine-tuning UniWorld-V1 and Bagel on ScaleEdit yields consistent gains, improving performance by up to 10.4% on ImgEdit and 35.1% on GEdit for general editing benchmarks and by up to 150.0% on RISE and 26.5% on KRIS-Bench for knowledge-infused benchmarks. These results demonstrate that open-source, agentic pipelines can approach commercial-grade data quality while retaining cost-effectiveness and scalability. Both the framework and dataset will be open-sourced.

Problem

Research questions and friction points this paper is trying to address.

image editing

dataset generation

open-source

multimodal models

data scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

open-source dataset

instruction-based image editing