From Scale to Speed: Adaptive Test-Time Scaling for Image Editing

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in Chain-of-Thought (CoT) approaches for image editing—inefficient resource allocation, unreliable early-stage verification, and redundant outputs—by proposing ADE-CoT, a novel framework that introduces the first adaptive test-time scaling mechanism tailored for image editing. ADE-CoT enhances efficiency and performance through difficulty-aware dynamic budget allocation, edit-specific pruning based on spatial localization and description consistency, and an intent-alignment-driven opportunistic early termination strategy. Extensive experiments demonstrate that ADE-CoT achieves over 2× acceleration across three state-of-the-art editing models and benchmarks while surpassing the edit quality of Best-of-N under the same computational budget.

Technology Category

Application Category

📝 Abstract
Image Chain-of-Thought (Image-CoT) is a test-time scaling paradigm that improves image generation by extending inference time. Most Image-CoT methods focus on text-to-image (T2I) generation. Unlike T2I generation, image editing is goal-directed: the solution space is constrained by the source image and instruction. This mismatch causes three challenges when applying Image-CoT to editing: inefficient resource allocation with fixed sampling budgets, unreliable early-stage verification using general MLLM scores, and redundant edited results from large-scale sampling. To address this, we propose ADaptive Edit-CoT (ADE-CoT), an on-demand test-time scaling framework to enhance editing efficiency and performance. It incorporates three key strategies: (1) a difficulty-aware resource allocation that assigns dynamic budgets based on estimated edit difficulty; (2) edit-specific verification in early pruning that uses region localization and caption consistency to select promising candidates; and (3) depth-first opportunistic stopping, guided by an instance-specific verifier, that terminates when intent-aligned results are found. Extensive experiments on three SOTA editing models (Step1X-Edit, BAGEL, FLUX.1 Kontext) across three benchmarks show that ADE-CoT achieves superior performance-efficiency trade-offs. With comparable sampling budgets, ADE-CoT obtains better performance with more than 2x speedup over Best-of-N.
Problem

Research questions and friction points this paper is trying to address.

Image Editing
Test-Time Scaling
Resource Allocation
Early Verification
Redundant Sampling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Test-Time Scaling
Image Editing
Chain-of-Thought
Efficiency-Performance Trade-off
Instance-Specific Verification
🔎 Similar Papers
No similar papers found.
Xiangyan Qu
Xiangyan Qu
IIE
Z
Zhenlong Yuan
AMAP, Alibaba Group
J
Jing Tang
AMAP, Alibaba Group
Rui Chen
Rui Chen
AMAP, Alibaba Group; Tsinghua University
Computer VisionPattern Recognition
D
Datao Tang
AMAP, Alibaba Group
M
Meng Yu
AMAP, Alibaba Group
L
Lei Sun
AMAP, Alibaba Group
Y
Yancheng Bai
AMAP, Alibaba Group
X
Xiangxiang Chu
AMAP, Alibaba Group
G
Gaopeng Gou
Institute of Information Engineering, Chinese Academy of Sciences; School of Cyber Security, University of Chinese Academy of Sciences
Gang Xiong
Gang Xiong
Institute of Automation,Chinese Academy of Sciences
Intelligent Control and ManagementIntelligent Transportation SystemsIntelligent Manufacturing
Yujun Cai
Yujun Cai
NTU → Meta → Lecturer(Assistant Professor) @UQ
Multi-Modal PerceptionVision-Language Models