ORIDa: Object-centric Real-world Image Composition Dataset

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image synthesis datasets suffer from limited scale, insufficient diversity, and a lack of structured counterfactual annotations grounded in real-world scenes. To address these limitations, we introduce ORIDa—the first large-scale, real-world captured, object-centric image synthesis benchmark—comprising over 30,000 high-quality images spanning 200 object categories across diverse scenes and spatial configurations. ORIDa pioneers a “factual–counterfactual” quintuple design, enabling joint evaluation of object localization, occlusion reasoning, and background consistency modeling. It leverages multi-view real-world capture, pixel-accurate object masks, and scene-level metadata management, underpinned by an extensible, structured grouping schema. Extensive experiments demonstrate that ORIDa substantially improves model performance on object repositioning, illumination/shadow consistency, and edge blending—establishing it as a new standard benchmark for image synthesis research.

Technology Category

Application Category

📝 Abstract
Object compositing, the task of placing and harmonizing objects in images of diverse visual scenes, has become an important task in computer vision with the rise of generative models. However, existing datasets lack the diversity and scale required to comprehensively explore real-world scenarios. We introduce ORIDa (Object-centric Real-world Image Composition Dataset), a large-scale, real-captured dataset containing over 30,000 images featuring 200 unique objects, each of which is presented across varied positions and scenes. ORIDa has two types of data: factual-counterfactual sets and factual-only scenes. The factual-counterfactual sets consist of four factual images showing an object in different positions within a scene and a single counterfactual (or background) image of the scene without the object, resulting in five images per scene. The factual-only scenes include a single image containing an object in a specific context, expanding the variety of environments. To our knowledge, ORIDa is the first publicly available dataset with its scale and complexity for real-world image composition. Extensive analysis and experiments highlight the value of ORIDa as a resource for advancing further research in object compositing.
Problem

Research questions and friction points this paper is trying to address.

Lack of diverse large-scale datasets for real-world image compositing
Need for comprehensive exploration of object placement in varied scenes
Absence of public datasets with ORIDa's scale and complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale real-captured dataset with 30,000 images
Factual-counterfactual sets for diverse object positions
First public dataset for real-world image composition
🔎 Similar Papers
No similar papers found.
J
Jinwoo Kim
Yonsei University
S
Sangmin Han
Yonsei University
Jinho Jeong
Jinho Jeong
Yonsei University
computer visionsuper-resolutiongenerative AI
J
Jiwoo Choi
Yonsei University
D
Dongyoung Kim
Yonsei University
Seon Joo Kim
Seon Joo Kim
Dept. of Computer Science, Yonsei University, Seoul, Korea
Computer VisionComputational PhotographyDeep Learning