🤖 AI Summary
This paper addresses the challenge of single-round clustering in federated learning under local differential privacy (LDP) constraints with non-IID data. Method: We propose a non-iterative framework integrating gravitational potential fields and topological persistence: client-private centroids are modeled as mass points in a potential field, and robust cluster centers are identified via persistent homology; we further design compactness-aware client-side perturbation and server-side topological aggregation to jointly suppress noise and preserve structural integrity. Contribution/Results: We derive closed-form theoretical bounds linking privacy budget ε to clustering error. Extensive experiments across ten benchmark datasets demonstrate that our method significantly outperforms existing single-round approaches—particularly achieving state-of-the-art privacy-accuracy trade-offs in strong-privacy regimes (ε < 1).
📝 Abstract
Clustering non-independent and identically distributed (non-IID) data under local differential privacy (LDP) in federated settings presents a critical challenge: preserving privacy while maintaining accuracy without iterative communication. Existing one-shot methods rely on unstable pairwise centroid distances or neighborhood rankings, degrading severely under strong LDP noise and data heterogeneity. We present Gravitational Federated Clustering (GFC), a novel approach to privacy-preserving federated clustering that overcomes the limitations of distance-based methods under varying LDP. Addressing the critical challenge of clustering non-IID data with diverse privacy guarantees, GFC transforms privatized client centroids into a global gravitational potential field where true cluster centers emerge as topologically persistent singularities. Our framework introduces two key innovations: (1) a client-side compactness-aware perturbation mechanism that encodes local cluster geometry as "mass" values, and (2) a server-side topological aggregation phase that extracts stable centroids through persistent homology analysis of the potential field's superlevel sets. Theoretically, we establish a closed-form bound between the privacy budget $ε$ and centroid estimation error, proving the potential field's Lipschitz smoothing properties exponentially suppress noise in high-density regions. Empirically, GFC outperforms state-of-the-art methods on ten benchmarks, especially under strong LDP constraints ($ε< 1$), while maintaining comparable performance at lower privacy budgets. By reformulating federated clustering as a topological persistence problem in a synthetic physics-inspired space, GFC achieves unprecedented privacy-accuracy trade-offs without iterative communication, providing a new perspective for privacy-preserving distributed learning.