Robust Clustering on High-Dimensional Data with Stochastic Quantization

📅 2024-09-03

🏛️ International Scientific Technical Journal "Problems of Control and Informatics"

📈 Citations: 2

✨ Influential: 0

career value

242K/year

🤖 AI Summary

To address the high memory consumption, low computational efficiency, and lack of theoretical convergence guarantees in traditional K-means–based clustering for high-dimensional data, this paper proposes a robust clustering framework based on Stochastic Quantization (SQ). We are the first to systematically integrate SQ—a method with strong theoretical convergence guarantees—into both unsupervised and semi-supervised clustering. The framework incorporates a Triplet Network to enable dimensionality-adaptive embedding and low-dimensional latent space modeling, while leveraging mini-batch optimization for scalability. Evaluated on partially labeled image classification tasks, our approach achieves significantly faster convergence and reduced memory footprint compared to K-means++ and mini-batch K-means, while attaining superior clustering performance. This demonstrates a unified alignment between theoretical convergence properties and practical efficacy.

Technology Category

Application Category

📝 Abstract

This paper addresses the limitations of traditional vector quantization (clustering) algorithms, particularly K-means and its variant K-means++, and explores the stochastic quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning problems. Some traditional clustering algorithms suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as mini-batch K-means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, SQ-algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data. To address the challenge of high dimensionality, we trained Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both SQ-algorithm and traditional quantization algorithm.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations of K-Means for high-dimensional data clustering.

Proposes Stochastic Quantization for scalable unsupervised learning tasks.

Enhances convergence speed with adaptive learning rate modifications.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Quantization for high-dimensional clustering

Triplet Network for low-dimensional image encoding

Adaptive learning rate for faster convergence

🔎 Similar Papers

Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions