Find your Needle: Small Object Image Retrieval via Multi-Object Attention Optimization

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Small-object image retrieval (SoIR) in cluttered scenes requires a single, compact image embedding that efficiently represents multiple objects while enabling scalable search—a longstanding challenge. Method: We propose the Multi-object Attention Optimization (MaO) framework, featuring: (1) a novel multi-object collaborative pretraining paradigm that explicitly models multiple objects within an image; (2) a mask-guided attention-based feature fusion mechanism for fine-grained, object-level feature extraction and aggregation; and (3) generation of a unified image embedding with strong discriminability and generalizability. MaO supports zero-shot transfer and lightweight fine-tuning. Results: Evaluated on a newly constructed SoIR benchmark, MaO significantly outperforms existing methods, achieving absolute mAP improvements of 12.7% (zero-shot) and 9.3% (fine-tuned), demonstrating its effectiveness and practicality for real-world SoIR tasks.

Technology Category

Application Category

📝 Abstract

We address the challenge of Small Object Image Retrieval (SoIR), where the goal is to retrieve images containing a specific small object, in a cluttered scene. The key challenge in this setting is constructing a single image descriptor, for scalable and efficient search, that effectively represents all objects in the image. In this paper, we first analyze the limitations of existing methods on this challenging task and then introduce new benchmarks to support SoIR evaluation. Next, we introduce Multi-object Attention Optimization (MaO), a novel retrieval framework which incorporates a dedicated multi-object pre-training phase. This is followed by a refinement process that leverages attention-based feature extraction with object masks, integrating them into a single unified image descriptor. Our MaO approach significantly outperforms existing retrieval methods and strong baselines, achieving notable improvements in both zero-shot and lightweight multi-object fine-tuning. We hope this work will lay the groundwork and inspire further research to enhance retrieval performance for this highly practical task.

Problem

Research questions and friction points this paper is trying to address.

Retrieve images with specific small objects in cluttered scenes.

Construct scalable image descriptors representing all objects effectively.

Introduce Multi-object Attention Optimization for improved retrieval performance.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-object Attention Optimization framework

Attention-based feature extraction with masks

Unified image descriptor for scalable search

🔎 Similar Papers

Unsupervised Collaborative Metric Learning with Mixed-Scale Groups for General Object Retrieval