HALO: Robust Out-of-Distribution Detection via Joint Optimisation

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

This paper addresses the insufficient robustness of existing out-of-distribution (OOD) detection methods under adversarial attacks. We propose a unified framework that jointly optimizes classification and OOD detection. Our core innovation is an adversarial OOD detection objective based on auxiliary classifiers, which explicitly models the trade-off between clean and robust performance, enabling defense against transfer-based attacks and hyperparameter-controllable performance tuning. Built upon the TRADES robust classification framework, our method introduces a plug-and-play joint loss term—compatible with mainstream OOD detectors without requiring architectural modifications. Extensive experiments across multiple datasets and attack settings demonstrate state-of-the-art performance: average AUROC improves by 3.15 points under clean conditions and 7.07 points under adversarial conditions, while exhibiting strong generalization across diverse attack types.

Technology Category

Application Category

📝 Abstract

Effective out-of-distribution (OOD) detection is crucial for the safe deployment of machine learning models in real-world scenarios. However, recent work has shown that OOD detection methods are vulnerable to adversarial attacks, potentially leading to critical failures in high-stakes applications. This discovery has motivated work on robust OOD detection methods that are capable of maintaining performance under various attack settings. Prior approaches have made progress on this problem but face a number of limitations: often only exhibiting robustness to attacks on OOD data or failing to maintain strong clean performance. In this work, we adapt an existing robust classification framework, TRADES, extending it to the problem of robust OOD detection and discovering a novel objective function. Recognising the critical importance of a strong clean/robust trade-off for OOD detection, we introduce an additional loss term which boosts classification and detection performance. Our approach, called HALO (Helper-based AdversariaL OOD detection), surpasses existing methods and achieves state-of-the-art performance across a number of datasets and attack settings. Extensive experiments demonstrate an average AUROC improvement of 3.15 in clean settings and 7.07 under adversarial attacks when compared to the next best method. Furthermore, HALO exhibits resistance to transferred attacks, offers tuneable performance through hyperparameter selection, and is compatible with existing OOD detection frameworks out-of-the-box, leaving open the possibility of future performance gains. Code is available at: https://github.com/hugo0076/HALO

Problem

Research questions and friction points this paper is trying to address.

Enhance robustness in out-of-distribution detection.

Address vulnerabilities to adversarial attacks in OOD detection.

Improve clean and robust performance trade-off.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends TRADES for robust detection

Introduces novel objective function

Adds loss term for performance boost

🔎 Similar Papers

No similar papers found.