Information Leakage in Data Linkage

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cross-organizational privacy-preserving record linkage (PPRL) remains vulnerable to sensitive attribute reconstruction in practice, posing real-world privacy risks despite its theoretical guarantees. Method: This study systematically identifies implicit information leakage pathways in PPRL protocols—including Bloom filters and secure multi-party computation—from the organizational practitioner’s perspective. We employ threat modeling, information inference analysis, protocol reverse auditing, and sensitive attribute recoverability assessment to uncover multiple channels through which participants may unintentionally infer sensitive attributes during legitimate data exchanges. Contribution/Results: We propose a risk identification and mitigation framework tailored for data custodians, bridging the gap between PPRL protocol design and real-world security evaluation. The work yields an actionable security checklist and concrete mitigation recommendations, which have been adopted by multiple health data collaboration initiatives. This represents the first systematic, practice-oriented security analysis of PPRL deployments, advancing both theoretical understanding and operational resilience of privacy-enhancing technologies.

Technology Category

Application Category

📝 Abstract
The process of linking databases that contain sensitive information about individuals across organisations is an increasingly common requirement in the health and social science research domains, as well as with governments and businesses. To protect personal data, protocols have been developed to limit the leakage of sensitive information. Furthermore, privacy-preserving record linkage (PPRL) techniques have been proposed to conduct linkage on encoded data. While PPRL techniques are now being employed in real-world applications, the focus of PPRL research has been on the technical aspects of linking sensitive data (such as encoding methods and cryptanalysis attacks), but not on organisational challenges when employing such techniques in practice. We analyse what sensitive information can possibly leak, either unintentionally or intentionally, in traditional data linkage as well as PPRL protocols, and what a party that participates in such a protocol can learn from the data it obtains legitimately within the protocol. We also show that PPRL protocols can still result in the unintentional leakage of sensitive information. We provide recommendations to help data custodians and other parties involved in a data linkage project to identify and prevent vulnerabilities and make their project more secure.
Problem

Research questions and friction points this paper is trying to address.

Analyzing information leakage in traditional and PPRL data linkage
Identifying unintentional sensitive data exposure in PPRL protocols
Providing recommendations to secure data linkage projects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy-preserving record linkage (PPRL) techniques
Analyzing sensitive information leakage risks
Recommendations to prevent data vulnerabilities
🔎 Similar Papers
No similar papers found.
Peter Christen
Peter Christen
Professor, Australian National University, and University of Edinburgh
Record linkageentity resolutiondata privacyadministrative datadata quality
R
Rainer Schnell
Methodology Research Group, University Duisburg-Essen, Duisburg, Germany
A
Anushka Vidanage
School of Computing, Australian National University, Canberra, Australia