CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator

πŸ“… 2026-04-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing conditional image editing methods, which typically rely on single-step generation and lack explicit quality control, often resulting in structural distortions, contextual inconsistencies, and excessive deviation from the original image. To overcome these issues, we propose CAMEOβ€”a quality-aware, feedback-driven multi-agent collaborative editing framework that introduces, for the first time, an integrated quality assessment and iterative feedback mechanism. CAMEO enables closed-loop optimization through coordinated planning, structured prompting, hypothesis generation, and adaptive reference fusion. The framework is compatible with mainstream editing backbone models and demonstrates significant improvements over multiple state-of-the-art approaches, achieving a 20% average win rate gain on tasks such as anomaly insertion and human pose transfer, thereby substantially enhancing controllability, robustness, and structural consistency in image editing.
πŸ“ Abstract
Conditional image editing aims to modify a source image according to textual prompts and optional reference guidance. Such editing is crucial in scenarios requiring strict structural control (i.e., anomaly insertion in driving scenes and complex human pose transformation). Despite recent advances in large-scale editing models (i.e., Seedream, Nano Banana, etc), most approaches rely on single-step generation. This paradigm often lacks explicit quality control, may introduce excessive deviation from the original image, and frequently produces structural artifacts or environment-inconsistent modifications, typically requiring manual prompt tuning to achieve acceptable results. We propose \textbf{CAMEO}, a structured multi-agent framework that reformulates conditional editing as a quality-aware, feedback-driven process rather than a one-shot generation task. CAMEO decomposes editing into coordinated stages of planning, structured prompting, hypothesis generation, and adaptive reference grounding, where external guidance is invoked only when task complexity requires it. To overcome the lack of intrinsic quality control in existing methods, evaluation is embedded directly within the editing loop. Intermediate results are iteratively refined through structured feedback, forming a closed-loop process that progressively corrects structural and contextual inconsistencies. We evaluate CAMEO on anomaly insertion and human pose switching tasks. Across multiple strong editing backbones and independent evaluation models, CAMEO consistently achieves 20\% more win rate on average compared to multiple state-of-the-art models, demonstrating improved robustness, controllability, and structural reliability in conditional image editing.
Problem

Research questions and friction points this paper is trying to address.

conditional image editing
quality control
structural consistency
one-shot generation
image artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework
quality-aware editing
feedback-driven refinement
conditional image editing
structured prompting
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuhan Pu
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, Guangdong, China
H
Hao Zheng
Harbin Institute of Technology, Weihai, Shandong, China
Z
Ziqian Mo
Shenzhen University, Shenzhen, Guangdong, China
H
Hill Zhang
Claremont McKenna College, Claremont, California, USA
T
Tianyi Fan
Research Institute of Petroleum Exploration and Development, CNPC, Beijing, China
S
Shuhong Wu
Research Institute of Petroleum Exploration and Development, CNPC, Beijing, China
Jiaheng Wei
Jiaheng Wei
Assistant Professor of Hong Kong University of Science and Technology (Guangzhou)
Label NoiseWeakly Supervised LearningLarge Language Models