🤖 AI Summary
This work addresses relational learning on attributed graphs under entity-level differential privacy (DP), tackling two key challenges: (i) high and ill-defined global sensitivity arising from entities participating in multiple relations, and (ii) the breakdown of classical privacy amplification analysis due to multi-stage coupled sampling. To resolve these, we first establish the first formal theoretical framework for entity-level DP in relational settings. We propose a frequency-aware adaptive gradient clipping mechanism that tightly bounds sensitivity based on entity occurrence counts. Furthermore, we generalize the privacy amplification theorem to a tractable subclass where sample sizes are coupled across stages. Empirically, fine-tuning encoder models on text-attributed networks demonstrates substantial improvements in the privacy–utility trade-off. Our open-source implementation validates the method’s effectiveness, rigorous DP guarantees, and scalability on real-world relational datasets.
📝 Abstract
Learning with relational and network-structured data is increasingly vital in sensitive domains where protecting the privacy of individual entities is paramount. Differential Privacy (DP) offers a principled approach for quantifying privacy risks, with DP-SGD emerging as a standard mechanism for private model training. However, directly applying DP-SGD to relational learning is challenging due to two key factors: (i) entities often participate in multiple relations, resulting in high and difficult-to-control sensitivity; and (ii) relational learning typically involves multi-stage, potentially coupled (interdependent) sampling procedures that make standard privacy amplification analyses inapplicable. This work presents a principled framework for relational learning with formal entity-level DP guarantees. We provide a rigorous sensitivity analysis and introduce an adaptive gradient clipping scheme that modulates clipping thresholds based on entity occurrence frequency. We also extend the privacy amplification results to a tractable subclass of coupled sampling, where the dependence arises only through sample sizes. These contributions lead to a tailored DP-SGD variant for relational data with provable privacy guarantees. Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP.