ArtifactLens: Hundreds of Labels Are Enough for Artifact Detection with VLMs

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently detecting artifacts—such as malformed hands or distorted objects—in AI-generated images, a task hindered by the heavy reliance of existing methods on large annotated datasets and their limited adaptability to rapidly evolving generative models and emerging artifact types. To overcome these limitations, the authors propose a few-shot artifact detection framework leveraging pre-trained vision-language models (VLMs). By integrating in-context learning, optimized textual prompts, and a multi-component detection architecture, the method achieves state-of-the-art performance with only a few hundred labeled examples per artifact category. Notably, it is the first approach to be uniformly evaluated across five benchmarks, demonstrating strong generalization across morphological, anatomical, and interaction-based artifacts, as well as broader applicability to general AIGC detection tasks.

Technology Category

Application Category

📝 Abstract
Modern image generators produce strikingly realistic images, where only artifacts like distorted hands or warped objects reveal their synthetic origin. Detecting these artifacts is essential: without detection, we cannot benchmark generators or train reward models to improve them. Current detectors fine-tune VLMs on tens of thousands of labeled images, but this is expensive to repeat whenever generators evolve or new artifact types emerge. We show that pretrained VLMs already encode the knowledge needed to detect artifacts - with the right scaffolding, this capability can be unlocked using only a few hundred labeled examples per artifact category. Our system, ArtifactLens, achieves state-of-the-art on five human artifact benchmarks (the first evaluation across multiple datasets) while requiring orders of magnitude less labeled data. The scaffolding consists of a multi-component architecture with in-context learning and text instruction optimization, with novel improvements to each. Our methods generalize to other artifact types - object morphology, animal anatomy, and entity interactions - and to the distinct task of AIGC detection.
Problem

Research questions and friction points this paper is trying to address.

artifact detection
visual language models
AIGC detection
few-shot learning
image generation artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

artifact detection
vision-language models
in-context learning
text instruction optimization
few-shot learning
🔎 Similar Papers
No similar papers found.