REALEDIT: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations

πŸ“… 2025-02-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing image editing models suffer from reliance on small, manually constructed datasets, limiting ecological validity and coverage of authentic user intent. This paper introduces REALEditβ€”the first large-scale, real-user-driven image editing dataset (48K training / 9.3K test samples), curated from actual Reddit edit requests and corresponding human-edited results. We propose a novel closed-loop evaluation benchmark integrating user intent, human editing outcomes, and deployment feedback; develop a high-fidelity annotation pipeline; design a multi-stage fine-tuning strategy for end-to-end model training; and introduce a joint evaluation framework combining Elo scoring, VIEScore, and human preference assessment. REALEdit achieves a 165-point Elo gain over the strongest baseline in human evaluation, a 92% improvement in VIEScore, and positive deployment feedback. Furthermore, when transferred to deepfake detection, it boosts F1-score by 14 percentage points.

Technology Category

Application Category

πŸ“ Abstract
Existing image editing models struggle to meet real-world demands. Despite excelling in academic benchmarks, they have yet to be widely adopted for real user needs. Datasets that power these models use artificial edits, lacking the scale and ecological validity necessary to address the true diversity of user requests. We introduce REALEDIT, a large-scale image editing dataset with authentic user requests and human-made edits sourced from Reddit. REALEDIT includes a test set of 9300 examples to evaluate models on real user requests. Our results show that existing models fall short on these tasks, highlighting the need for realistic training data. To address this, we introduce 48K training examples and train our REALEDIT model, achieving substantial gains - outperforming competitors by up to 165 Elo points in human judgment and 92 percent relative improvement on the automated VIEScore metric. We deploy our model on Reddit, testing it on new requests, and receive positive feedback. Beyond image editing, we explore REALEDIT's potential in detecting edited images by partnering with a deepfake detection non-profit. Finetuning their model on REALEDIT data improves its F1-score by 14 percentage points, underscoring the dataset's value for broad applications.
Problem

Research questions and friction points this paper is trying to address.

Real-world image editing demands unmet
Lack of authentic large-scale datasets
Need for realistic training data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale image editing dataset
Authentic user requests integration
Enhanced deepfake detection capabilities
πŸ”Ž Similar Papers