🤖 AI Summary
This work addresses the prevalence of visual artifacts in current AI-generated images and the limitations of existing correction methods, which rely on costly and non-scalable human-annotated data. To overcome this, the authors propose ArtiAgent, a novel multi-agent framework that automates artifact synthesis and annotation without manual labeling. The framework employs a perception agent to identify real-image entities and injects controllable artifacts via block-level embeddings within a diffusion transformer, simultaneously generating both local and global explanations. This approach efficiently produces 100,000 high-quality, precisely annotated artifact images, demonstrating strong fidelity, scalability, and generalizability across multiple downstream tasks.
📝 Abstract
Despite recent advances in diffusion models, AI generated images still often contain visual artifacts that compromise realism. Although more thorough pre-training and bigger models might reduce artifacts, there is no assurance that they can be completely eliminated, which makes artifact mitigation a highly crucial area of study. Previous artifact-aware methodologies depend on human-labeled artifact datasets, which are costly and difficult to scale, underscoring the need for an automated approach to reliably acquire artifact-annotated datasets. In this paper, we propose ArtiAgent, which efficiently creates pairs of real and artifact-injected images. It comprises three agents: a perception agent that recognizes and grounds entities and subentities from real images, a synthesis agent that introduces artifacts via artifact injection tools through novel patch-wise embedding manipulation within a diffusion transformer, and a curation agent that filters the synthesized artifacts and generates both local and global explanations for each instance. Using ArtiAgent, we synthesize 100K images with rich artifact annotations and demonstrate both efficacy and versatility across diverse applications. Code is available at link.