🤖 AI Summary
To address high annotation costs, viewpoint/illumination/class-distribution biases, and poor cross-domain generalization in garbage image classification, this paper proposes an unsupervised category discovery method. Our approach introduces a novel dual-encoder contrastive learning framework—comprising a ConvNeXt backbone and a ViT-based positive sample generator—coupled with a multi-clustering ensemble voting mechanism integrating K-means, spectral clustering, and DBSCAN, requiring zero human annotations for robust category structure discovery. The method effectively mitigates style shifts, achieving 93.78% and 98.29% accuracy on TrashNet and Huawei Cloud datasets, respectively. With only 50 labeled samples for calibration on 4,169 real-world images, accuracy improves by 29.85%. This work delivers high precision, strong generalization across domains, and significantly reduced labeling cost—advancing practical deployment of garbage classification systems.
📝 Abstract
Waste classification is crucial for improving processing efficiency and reducing environmental pollution. Supervised deep learning methods are commonly used for automated waste classification, but they rely heavily on large labeled datasets, which are costly and inefficient to obtain. Real-world waste data often exhibit category and style biases, such as variations in camera angles, lighting conditions, and types of waste, which can impact the model's performance and generalization ability. Therefore, constructing a bias-free dataset is essential. Manual labeling is not only costly but also inefficient. While self-supervised learning helps address data scarcity, it still depends on some labeled data and generally results in lower accuracy compared to supervised methods. Unsupervised methods show potential in certain cases but typically do not perform as well as supervised models, highlighting the need for an efficient and cost-effective unsupervised approach. This study presents a novel unsupervised method, Dual-Encoder Contrastive Learning with Multi-Clustering Voting (DECMCV). The approach involves using a pre-trained ConvNeXt model for image encoding, leveraging VisionTransformer to generate positive samples, and applying a multi-clustering voting mechanism to address data labeling and domain shift issues. Experimental results demonstrate that DECMCV achieves classification accuracies of 93.78% and 98.29% on the TrashNet and Huawei Cloud datasets, respectively, outperforming or matching supervised models. On a real-world dataset of 4,169 waste images, only 50 labeled samples were needed to accurately label thousands, improving classification accuracy by 29.85% compared to supervised models. This method effectively addresses style differences, enhances model generalization, and contributes to the advancement of automated waste classification.