Distributed Gradient Clustering: Convergence and the Effect of Initialization

📅 2026-03-20

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This study addresses the challenge of achieving global clustering in distributed networks where each user can communicate only with its neighbors and possesses local data. To this end, the authors propose a novel distributed centroid initialization method inspired by K-means++, integrated with a distributed gradient-based clustering algorithm, enabling effective global clustering under strict local communication constraints. Experimental results demonstrate that the proposed initialization strategy significantly outperforms random initialization, yielding improved clustering performance and enhanced robustness to the choice of initial centroids. Notably, the approach even surpasses centralized gradient clustering methods on certain evaluation metrics, highlighting its effectiveness in decentralized settings.

Technology Category

Application Category

📝 Abstract

We study the effects of center initialization on the performance of a family of distributed gradient-based clustering algorithms introduced in [1], that work over connected networks of users. In the considered scenario, each user contains a local dataset and communicates only with its immediate neighbours, with the aim of finding a global clustering of the joint data. We perform extensive numerical experiments, evaluating the effects of center initialization on the performance of our family of methods, demonstrating that our methods are more resilient to the effects of initialization, compared to centralized gradient clustering [2]. Next, inspired by the $K$-means++ initialization [3], we propose a novel distributed center initialization scheme, which is shown to improve the performance of our methods, compared to the baseline random initialization.

Problem

Research questions and friction points this paper is trying to address.

distributed clustering

gradient-based clustering

center initialization

networked data

K-means++

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed clustering

Gradient-based clustering

Center initialization