🤖 AI Summary
This work addresses the critical challenge of metric selection in k-anonymous social network publishing. We propose a multidimensional evaluation framework—spanning privacy objectives, adversarial assumptions, utility preservation, output format, and computational complexity—to systematically compare structural neighborhood-based metrics. Through theoretical analysis and empirical evaluation on real-world networks with millions of edges, we demonstrate that neighborhood expansion coupled with structural simplification constitutes a key pathway to achieving strong privacy guarantees while maintaining low computational overhead. We further identify a class of lightweight, efficient, and robust neighborhood metrics. Crucially, our study provides the first quantitative validation that metric choice decisively governs the privacy–utility–efficiency trade-off. The findings yield an interpretable, reproducible, and principled methodology for metric selection in k-anonymous social graph publishing.
📝 Abstract
Privacy-aware sharing of network data is a difficult task due to the interconnectedness of individuals in networks. An important part of this problem is the inherently difficult question of how in a particular situation the privacy of an individual node should be measured. To that end, in this paper we propose a set of aspects that one should consider when choosing a measure for privacy. These aspects include the type of desired privacy and attacker scenario against which the measure protects, utility of the data, the type of desired output, and the computational complexity of the chosen measure. Based on these aspects, we provide a systematic overview of existing approaches in the literature. We then focus on a set of measures that ultimately enables our objective: sharing the anonymized full network dataset with limited disclosure risk. The considered measures, each based on the concept of k-anonymity, account for the structure of the surroundings of a certain node and differ in completeness and reach of the structural information taken into account. We present a comprehensive theoretical characterization as well as comparative empirical experiments on a wide range of real-world network datasets with up to millions of edges. We find that the choice of the measure has an enormous effect on aforementioned aspects. Most interestingly, we find that the most effective measures consider a greater node vicinity, yet utilize minimal structural information and thus use minimal computational resources. This finding has important implications for researchers and practitioners, who may, based on the recommendations given in this paper, make an informed choice on how to safely share large-scale network data in a privacy-aware manner.