A systematic comparison of measures for k-anonymity in networks

📅 2024-07-02

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work addresses the critical challenge of metric selection in k-anonymous social network publishing. We propose a multidimensional evaluation framework—spanning privacy objectives, adversarial assumptions, utility preservation, output format, and computational complexity—to systematically compare structural neighborhood-based metrics. Through theoretical analysis and empirical evaluation on real-world networks with millions of edges, we demonstrate that neighborhood expansion coupled with structural simplification constitutes a key pathway to achieving strong privacy guarantees while maintaining low computational overhead. We further identify a class of lightweight, efficient, and robust neighborhood metrics. Crucially, our study provides the first quantitative validation that metric choice decisively governs the privacy–utility–efficiency trade-off. The findings yield an interpretable, reproducible, and principled methodology for metric selection in k-anonymous social graph publishing.

Technology Category

Application Category

📝 Abstract

Privacy-aware sharing of network data is a difficult task due to the interconnectedness of individuals in networks. An important part of this problem is the inherently difficult question of how in a particular situation the privacy of an individual node should be measured. To that end, in this paper we propose a set of aspects that one should consider when choosing a measure for privacy. These aspects include the type of desired privacy and attacker scenario against which the measure protects, utility of the data, the type of desired output, and the computational complexity of the chosen measure. Based on these aspects, we provide a systematic overview of existing approaches in the literature. We then focus on a set of measures that ultimately enables our objective: sharing the anonymized full network dataset with limited disclosure risk. The considered measures, each based on the concept of k-anonymity, account for the structure of the surroundings of a certain node and differ in completeness and reach of the structural information taken into account. We present a comprehensive theoretical characterization as well as comparative empirical experiments on a wide range of real-world network datasets with up to millions of edges. We find that the choice of the measure has an enormous effect on aforementioned aspects. Most interestingly, we find that the most effective measures consider a greater node vicinity, yet utilize minimal structural information and thus use minimal computational resources. This finding has important implications for researchers and practitioners, who may, based on the recommendations given in this paper, make an informed choice on how to safely share large-scale network data in a privacy-aware manner.

Problem

Research questions and friction points this paper is trying to address.

Comparing k-anonymity measures for social network privacy

Evaluating structural information impact on anonymity levels

Assessing privacy-utility trade-offs in data publishing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic comparison of k-anonymity measures

Theoretical characterization of anonymity measures

Empirical evaluation on real-world network datasets

🔎 Similar Papers

The anonymization problem in social networks