Stronger Re-identification Attacks through Reasoning and Aggregation

๐Ÿ“… 2025-10-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper addresses the insufficient security evaluation of text de-identification techniques by proposing a robustness assessment framework targeting re-identification attacks. Methodologically, it innovatively integrates a multi-order sequence prediction aggregation mechanism with an external-knowledge-augmented reasoning language model to construct an automated adversarial attack pipelineโ€”more realistically simulating how adversaries leverage background knowledge to recover PII. Compared to conventional single-prediction or black-box attacks, the framework significantly improves re-identification accuracy (average +18.7%), especially under high background-knowledge density. The study not only uncovers critical vulnerabilities in current de-identification methods but also establishes a reproducible, scalable, and quantitative evaluation paradigm. This advances standardized benchmarking and iterative refinement of privacy-preserving technologies.

Technology Category

Application Category

๐Ÿ“ Abstract
Text de-identification techniques are often used to mask personally identifiable information (PII) from documents. Their ability to conceal the identity of the individuals mentioned in a text is, however, hard to measure. Recent work has shown how the robustness of de-identification methods could be assessed by attempting the reverse process of _re-identification_, based on an automated adversary using its background knowledge to uncover the PIIs that have been masked. This paper presents two complementary strategies to build stronger re-identification attacks. We first show that (1) the _order_ in which the PII spans are re-identified matters, and that aggregating predictions across multiple orderings leads to improved results. We also find that (2) reasoning models can boost the re-identification performance, especially when the adversary is assumed to have access to extensive background knowledge.
Problem

Research questions and friction points this paper is trying to address.

Evaluating de-identification robustness via re-identification attacks
Improving attack strength through ordered aggregation strategies
Enhancing re-identification using reasoning models with background knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregating predictions across multiple re-identification orderings
Using reasoning models to boost re-identification performance
Leveraging extensive background knowledge for stronger attacks
๐Ÿ”Ž Similar Papers
No similar papers found.