🤖 AI Summary
Existing vision datasets lack out-of-context samples, hindering research in context-aware visual understanding and image forensics.
Method: We introduce COinCO, the first large-scale, systematically annotated dataset of context-consistent images (97,722 samples), generated by semantically controlled object replacement on COCO using diffusion models and fine-grained plausibility annotation via multimodal large language models. We propose the novel “Objects-from-Context” prediction task and develop a context-enhanced zero-shot forgery detection framework—requiring no fine-tuning—by integrating diffusion-based inpainting, multimodal plausibility assessment, semantic prior modeling, and context-aware generative classification.
Results: Our approach achieves significant gains in context classification accuracy; establishes the first baseline for instance- and cluster-level object attribution prediction; and delivers zero-shot, context-aware performance improvements to state-of-the-art forgery detectors.
📝 Abstract
We present Common Inpainted Objects In-N-Out of Context (COinCO), a novel dataset addressing the scarcity of out-of-context examples in existing vision datasets. By systematically replacing objects in COCO images through diffusion-based inpainting, we create 97,722 unique images featuring both contextually coherent and inconsistent scenes, enabling effective context learning. Each inpainted object is meticulously verified and categorized as in- or out-of-context through a multimodal large language model assessment. Our analysis reveals significant patterns in semantic priors that influence inpainting success across object categories. We demonstrate three key tasks enabled by COinCO: (1) training context classifiers that effectively determine whether existing objects belong in their context; (2) a novel Objects-from-Context prediction task that determines which new objects naturally belong in given scenes at both instance and clique levels, and (3) context-enhanced fake detection on state-of-the-art methods without fine-tuning. COinCO provides a controlled testbed with contextual variations, establishing a foundation for advancing context-aware visual understanding in computer vision and image forensics. Our code and data are at: https://github.com/YangTianze009/COinCO.