SAMOSA: Sharpness Aware Minimization for Open Set Active learning

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-set active learning faces two key challenges: the presence of unknown-class samples in unlabeled data and high annotation costs. To address these, we propose SAMOSA—a novel active learning algorithm that integrates Sharpness-Aware Minimization (SAM) with sample typicality modeling for the first time. SAMOSA operates on the embedding manifold to precisely identify high-information samples that lie near the decision boundary yet exhibit low typicality, thereby simultaneously enhancing discriminability for target classes and filtering out irrelevant (unknown-class) instances. Grounded in theoretical analysis of stochastic gradient descent, SAMOSA achieves improved query efficiency without introducing additional computational overhead. Extensive experiments across multiple benchmark datasets demonstrate that SAMOSA outperforms state-of-the-art methods by up to 3% in classification accuracy, while significantly improving annotation efficiency and model generalization under open-set conditions.

Technology Category

Application Category

📝 Abstract
Modern machine learning solutions require extensive data collection where labeling remains costly. To reduce this burden, open set active learning approaches aim to select informative samples from a large pool of unlabeled data that includes irrelevant or unknown classes. In this context, we propose Sharpness Aware Minimization for Open Set Active Learning (SAMOSA) as an effective querying algorithm. Building on theoretical findings concerning the impact of data typicality on the generalization properties of traditional stochastic gradient descent (SGD) and sharpness-aware minimization (SAM), SAMOSA actively queries samples based on their typicality. SAMOSA effectively identifies atypical samples that belong to regions of the embedding manifold close to the model decision boundaries. Therefore, SAMOSA prioritizes the samples that are (i) highly informative for the targeted classes, and (ii) useful for distinguishing between targeted and unwanted classes. Extensive experiments show that SAMOSA achieves up to 3% accuracy improvement over the state of the art across several datasets, while not introducing computational overhead. The source code of our experiments is available at: https://anonymous.4open.science/r/samosa-DAF4
Problem

Research questions and friction points this paper is trying to address.

Reduces labeling costs via open set active learning
Identifies atypical samples near decision boundaries
Distinguishes targeted classes from irrelevant ones
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sharpness-aware minimization for open set learning
Queries samples based on data typicality near boundaries
Identifies atypical samples to distinguish target classes
🔎 Similar Papers
No similar papers found.
Y
Young In Kim
Department of Computer Science, Purdue University, West Lafaytte, IN 47906, USA
A
Andrea Agiollo
Department of Computer Science, Delft University of Technology, Delft, Netherlands
Rajiv Khanna
Rajiv Khanna
Assistant Prof, PurdueCS
Machine LearningBig Data Algorithms