NEP: Autoregressive Image Editing via Next Editing Token Prediction

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing text-guided image editing methods require full-image regeneration, leading to computational redundancy and structural degradation in unedited regions. This work proposes an autoregressive local editing paradigm that models editing as a “next edit token prediction” task, regenerating only user-specified regions without global reconstruction. Our key contributions are: (i) a novel arbitrarily ordered autoregressive text-to-image pretraining framework enabling zero-shot, fine-tuning-free region-specific editing; (ii) test-time iterative optimization and token expansion strategies for enhanced fidelity and flexibility. Evaluated on standard benchmarks, our method achieves state-of-the-art performance while significantly reducing computational overhead. Crucially, it better preserves semantic coherence and geometric consistency of the original image—particularly in non-edited areas—compared to prior full-image generation approaches.

Technology Category

Application Category

📝 Abstract

Text-guided image editing involves modifying a source image based on a language instruction and, typically, requires changes to only small local regions. However, existing approaches generate the entire target image rather than selectively regenerate only the intended editing areas. This results in (1) unnecessary computational costs and (2) a bias toward reconstructing non-editing regions, which compromises the quality of the intended edits. To resolve these limitations, we propose to formulate image editing as Next Editing-token Prediction (NEP) based on autoregressive image generation, where only regions that need to be edited are regenerated, thus avoiding unintended modification to the non-editing areas. To enable any-region editing, we propose to pre-train an any-order autoregressive text-to-image (T2I) model. Once trained, it is capable of zero-shot image editing and can be easily adapted to NEP for image editing, which achieves a new state-of-the-art on widely used image editing benchmarks. Moreover, our model naturally supports test-time scaling (TTS) through iteratively refining its generation in a zero-shot manner. The project page is: https://nep-bigai.github.io/

Problem

Research questions and friction points this paper is trying to address.

Selective regeneration of intended image edit areas

Reducing computational costs in image editing

Avoiding unintended modifications to non-editing regions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive editing token prediction for selective regeneration

Pre-trained any-order autoregressive T2I model

Zero-shot editing with test-time iterative refinement

🔎 Similar Papers

No similar papers found.