Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality natural out-of-distribution (OOD) samples in OOD detection, this paper proposes HamOS—the first unsupervised outlier synthesis framework based on Hamiltonian Monte Carlo (HMC). HamOS requires only in-distribution data and efficiently generates diverse, representative synthetic outliers in feature space via gradient-driven latent-space traversal and high-acceptance-rate (≈1) HMC sampling—without relying on auxiliary OOD data or pretrained generative models. Its core innovation lies in the first application of HMC to outlier synthesis, coupled with Markov chain modeling to achieve a low-overhead, training-free synthesis mechanism. Evaluated on standard and large-scale benchmarks, HamOS consistently outperforms state-of-the-art methods, achieving significant gains in OOD detection accuracy and improving synthesis efficiency by 3–5×.

Technology Category

Application Category

📝 Abstract
Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabilities. Nonetheless, these methods heavily rely on acquiring a large pool of high-quality natural outliers. Some prior methods try to alleviate this problem by synthesizing virtual outliers but suffer from either poor quality or high cost due to the monotonous sampling strategy and the heavy-parameterized generative models. In this paper, we overcome all these problems by proposing the Hamiltonian Monte Carlo Outlier Synthesis (HamOS) framework, which views the synthesis process as sampling from Markov chains. Based solely on the in-distribution data, the Markov chains can extensively traverse the feature space and generate diverse and representative outliers, hence exposing the model to miscellaneous potential OOD scenarios. The Hamiltonian Monte Carlo with sampling acceptance rate almost close to 1 also makes our framework enjoy great efficiency. By empirically competing with SOTA baselines on both standard and large-scale benchmarks, we verify the efficacy and efficiency of our proposed HamOS.
Problem

Research questions and friction points this paper is trying to address.

Machine Learning
Data Quality
Anomaly Detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hamiltonian Monte Carlo Outlier Synthesis
data synthesis
anomaly detection enhancement
Hengzhuang Li
Hengzhuang Li
Master's Student, Huazhong University of Science and Technology
Open-world Machine LearningMultimodal Foundation Models
T
Teng Zhang
National Engineering Research Center for Big Data Technology and System, Service Computing Technology and Systems Laboratory, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China