Efficient Odd-One-Out Anomaly Detection

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses odd-one-out anomaly detection in multi-object scenes—identifying visually salient outlier objects relative to their contextual surroundings. The task demands cross-view spatial reasoning, context-aware relational modeling, and strong generalization across object categories and spatial layouts. To this end, we propose a lightweight, efficient architecture grounded in DINO features, incorporating multi-view feature fusion and structured relational modeling; it reduces parameter count and training time by approximately two-thirds while matching state-of-the-art detection accuracy and significantly improving inference efficiency. Furthermore, we establish the first systematic multimodal large language model (MLLM) baseline for this task, empirically revealing its fundamental limitations in structured visual reasoning. Our contributions include a novel, efficient paradigm for vision-based anomaly detection and a rigorous empirical benchmark that advances both methodological design and evaluation standards for generalizable, real-time visual anomaly identification.

Technology Category

Application Category

📝 Abstract

The recently introduced odd-one-out anomaly detection task involves identifying the odd-looking instances within a multi-object scene. This problem presents several challenges for modern deep learning models, demanding spatial reasoning across multiple views and relational reasoning to understand context and generalize across varying object categories and layouts. We argue that these challenges must be addressed with efficiency in mind. To this end, we propose a DINO-based model that reduces the number of parameters by one third and shortens training time by a factor of three compared to the current state-of-the-art, while maintaining competitive performance. Our experimental evaluation also introduces a Multimodal Large Language Model baseline, providing insights into its current limitations in structured visual reasoning tasks. The project page can be found at https://silviochito.github.io/EfficientOddOneOut/

Problem

Research questions and friction points this paper is trying to address.

Detecting anomalous objects in multi-object visual scenes

Addressing spatial and relational reasoning challenges efficiently

Reducing model parameters and training time while maintaining performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

DINO-based model reduces parameters by one third

Training time shortened by factor of three

Maintains competitive performance with efficiency

🔎 Similar Papers

No similar papers found.