Efficient Odd-One-Out Anomaly Detection

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses odd-one-out anomaly detection in multi-object scenes—identifying visually salient outlier objects relative to their contextual surroundings. The task demands cross-view spatial reasoning, context-aware relational modeling, and strong generalization across object categories and spatial layouts. To this end, we propose a lightweight, efficient architecture grounded in DINO features, incorporating multi-view feature fusion and structured relational modeling; it reduces parameter count and training time by approximately two-thirds while matching state-of-the-art detection accuracy and significantly improving inference efficiency. Furthermore, we establish the first systematic multimodal large language model (MLLM) baseline for this task, empirically revealing its fundamental limitations in structured visual reasoning. Our contributions include a novel, efficient paradigm for vision-based anomaly detection and a rigorous empirical benchmark that advances both methodological design and evaluation standards for generalizable, real-time visual anomaly identification.

Technology Category

Application Category

📝 Abstract
The recently introduced odd-one-out anomaly detection task involves identifying the odd-looking instances within a multi-object scene. This problem presents several challenges for modern deep learning models, demanding spatial reasoning across multiple views and relational reasoning to understand context and generalize across varying object categories and layouts. We argue that these challenges must be addressed with efficiency in mind. To this end, we propose a DINO-based model that reduces the number of parameters by one third and shortens training time by a factor of three compared to the current state-of-the-art, while maintaining competitive performance. Our experimental evaluation also introduces a Multimodal Large Language Model baseline, providing insights into its current limitations in structured visual reasoning tasks. The project page can be found at https://silviochito.github.io/EfficientOddOneOut/
Problem

Research questions and friction points this paper is trying to address.

Detecting anomalous objects in multi-object visual scenes
Addressing spatial and relational reasoning challenges efficiently
Reducing model parameters and training time while maintaining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

DINO-based model reduces parameters by one third
Training time shortened by factor of three
Maintains competitive performance with efficiency
🔎 Similar Papers
No similar papers found.
S
Silvio Chito
Politecnico di Torino, Italy
P
Paolo Rabino
Politecnico di Torino, Italy
Tatiana Tommasi
Tatiana Tommasi
Politecnico di Torino
machine learningcomputer visionartificial intelligence