Unsupervised Learning for Class Distribution Mismatch

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses unsupervised open-set recognition under class distribution mismatch (CDM) between training and target tasks. Methodologically, it introduces the first fully unsupervised CDM modeling framework: (1) a novel diffusion-based paradigm for generating semantically controllable positive–negative sample pairs, enabling precise addition or removal of class semantics; and (2) a confidence-driven dynamic pseudo-labeling iteration mechanism that eliminates reliance on labeled data or semi-supervised assumptions. Evaluated on Tiny-ImageNet with 60% distribution mismatch, the method significantly outperforms OpenMatch (trained with 40 labels per class), achieving absolute improvements of 35.1%, 63.7%, and 72.5% in accuracy for known classes, unknown classes, and novel classes, respectively. These results demonstrate superior joint identification capability in open-world scenarios under severe distribution shift.

Technology Category

Application Category

📝 Abstract
Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an"other"category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability and performance. To address this, we propose Unsupervised Learning for Class Distribution Mismatch (UCDM), which constructs positive-negative pairs from unlabeled data for classifier training. Our approach randomly samples images and uses a diffusion model to add or erase semantic classes, synthesizing diverse training pairs. Additionally, we introduce a confidence-based labeling mechanism that iteratively assigns pseudo-labels to valuable real-world data and incorporates them into the training process. Extensive experiments on three datasets demonstrate UCDM's superiority over previous semi-supervised methods. Specifically, with a 60% mismatch proportion on Tiny-ImageNet dataset, our approach, without relying on labeled data, surpasses OpenMatch (with 40 labels per class) by 35.1%, 63.7%, and 72.5% in classifying known, unknown, and new classes.
Problem

Research questions and friction points this paper is trying to address.

Addressing class distribution mismatch without labeled data
Training classifiers using synthesized positive-negative pairs
Improving classification of known and unknown classes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion model to synthesize training pairs
Employs confidence-based pseudo-labeling for real-world data
Trains classifier with positive-negative pairs from unlabeled data
🔎 Similar Papers
No similar papers found.
P
Pan Du
School of Information, Renmin University of China, and Engineering Research Center of Database and Business Intelligence, MOE, China
Wangbo Zhao
Wangbo Zhao
National University of Singapore
Efficient Deep LearningDynamic Neural NetworkMultimodal Model
X
Xinai Lu
School of Agricultural Economics and Rural Development, Renmin University of China
N
Nian Liu
Independent Researcher
Z
Zhikai Li
Institute of Automation, Chinese Academy of Sciences
Chaoyu Gong
Chaoyu Gong
NTU
S
Suyun Zhao
School of Information, Renmin University of China, and Engineering Research Center of Database and Business Intelligence, MOE, China
H
Hong Chen
School of Information, Renmin University of China, and Engineering Research Center of Database and Business Intelligence, MOE, China
Cuiping Li
Cuiping Li
Renmin University of China
Databasebig data analysis and mining
K
Kai Wang
National University of Singapore
Yang You
Yang You
Postdoc, Stanford University
3D visioncomputer graphicscomputational geometry