Multiple Object Stitching for Unsupervised Representation Learning

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing contrastive learning methods achieve strong performance on single-object centered images but suffer significant degradation on natural images containing multiple objects. To address this limitation, we propose the Multi-Object Stitching (MOS) paradigm, which synthesizes multi-object images in a controllable manner to establish cross-sample, object-level correspondences—enabling fine-grained object representation learning without human annotations. MOS integrates image stitching augmentation, a contrastive learning framework, self-supervised feature disentanglement, and object localization priors. Evaluated on ImageNet, CIFAR, and COCO, MOS achieves state-of-the-art performance in unsupervised representation learning. Moreover, it substantially improves downstream performance on object detection and semantic segmentation, demonstrating its effectiveness and generalizability for modeling multi-object scenes.

Technology Category

Application Category

📝 Abstract

Contrastive learning for single object centric images has achieved remarkable progress on unsupervised representation, but suffering inferior performance on the widespread images with multiple objects. In this paper, we propose a simple but effective method, Multiple Object Stitching (MOS), to refine the unsupervised representation for multi-object images. Specifically, we construct the multi-object images by stitching the single object centric ones, where the objects in the synthesized multi-object images are predetermined. Hence, compared to the existing contrastive methods, our method provides additional object correspondences between multi-object images without human annotations. In this manner, our method pays more attention to the representations of each object in multi-object image, thus providing more detailed representations for complicated downstream tasks, such as object detection and semantic segmentation. Experimental results on ImageNet, CIFAR and COCO datasets demonstrate that our proposed method achieves the leading unsupervised representation performance on both single object centric images and multi-object ones. The source code is available at https://github.com/visresearch/MultipleObjectStitching.

Problem

Research questions and friction points this paper is trying to address.

Improving unsupervised representation for multi-object images

Providing object correspondences without human annotations

Enhancing detailed representations for complex downstream tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stitching single object images for multi-object representation

Providing object correspondences without human annotations

Enhancing detailed representations for downstream tasks

🔎 Similar Papers

No similar papers found.

Bosch Group

Hildesheim, NDS, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)