Random Similarity Isolation Forests

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing anomaly detection methods predominantly operate on single-modality data; when handling heterogeneous features—such as time series, images, and graphs—they typically require pre-fusion or unified feature representations, which often degrade performance. This work proposes the first end-to-end multimodal anomaly detection framework that jointly models cross-modal interactions without requiring modality alignment or explicit feature transformation. Our approach innovatively integrates isolation mechanisms with cross-modal similarity projection, yielding an ensemble model grounded in random similarity kernels, isolation tree structures, and a novel multimodal distance metric. Evaluated on 47 benchmark datasets, our method consistently outperforms five state-of-the-art approaches, demonstrating statistically significant improvements in detection accuracy. The results substantiate that joint multimodal modeling—not merely feature concatenation or late fusion—yields substantial gains in anomaly detection performance, particularly for complex, real-world heterogeneous data.

Technology Category

Application Category

📝 Abstract
With predictive models becoming prevalent, companies are expanding the types of data they gather. As a result, the collected datasets consist not only of simple numerical features but also more complex objects such as time series, images, or graphs. Such multi-modal data have the potential to improve performance in predictive tasks like outlier detection, where the goal is to identify objects deviating from the main data distribution. However, current outlier detection algorithms are dedicated to individual types of data. Consequently, working with mixed types of data requires either fusing multiple data-specific models or transforming all of the representations into a single format, both of which can hinder predictive performance. In this paper, we propose a multi-modal outlier detection algorithm called Random Similarity Isolation Forest. Our method combines the notions of isolation and similarity-based projection to handle datasets with mixtures of features of arbitrary data types. Experiments performed on 47 benchmark datasets demonstrate that Random Similarity Isolation Forest outperforms five state-of-the-art competitors. Our study shows that the use of multiple modalities can indeed improve the detection of anomalies and highlights the need for new outlier detection benchmarks tailored for multi-modal algorithms.
Problem

Research questions and friction points this paper is trying to address.

Handles mixed data types for outlier detection
Improves anomaly detection with multi-modal data
Introduces Random Similarity Isolation Forest algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Isolation and similarity-based projection
Handles arbitrary data types
Outperforms state-of-the-art competitors
🔎 Similar Papers
No similar papers found.
S
Sebastian Chwilczyński
Institute of Computing Science, Poznan University of Technology, Poland
Dariusz Brzezinski
Dariusz Brzezinski
Poznan University of Technology
machine learningevaluation metricsbioinformaticsdata stream mining