Stronger Re-identification Attacks through Reasoning and Aggregation

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This paper addresses the insufficient security evaluation of text de-identification techniques by proposing a robustness assessment framework targeting re-identification attacks. Methodologically, it innovatively integrates a multi-order sequence prediction aggregation mechanism with an external-knowledge-augmented reasoning language model to construct an automated adversarial attack pipeline—more realistically simulating how adversaries leverage background knowledge to recover PII. Compared to conventional single-prediction or black-box attacks, the framework significantly improves re-identification accuracy (average +18.7%), especially under high background-knowledge density. The study not only uncovers critical vulnerabilities in current de-identification methods but also establishes a reproducible, scalable, and quantitative evaluation paradigm. This advances standardized benchmarking and iterative refinement of privacy-preserving technologies.

Technology Category

Application Category

📝 Abstract

Text de-identification techniques are often used to mask personally identifiable information (PII) from documents. Their ability to conceal the identity of the individuals mentioned in a text is, however, hard to measure. Recent work has shown how the robustness of de-identification methods could be assessed by attempting the reverse process of _re-identification_, based on an automated adversary using its background knowledge to uncover the PIIs that have been masked. This paper presents two complementary strategies to build stronger re-identification attacks. We first show that (1) the _order_ in which the PII spans are re-identified matters, and that aggregating predictions across multiple orderings leads to improved results. We also find that (2) reasoning models can boost the re-identification performance, especially when the adversary is assumed to have access to extensive background knowledge.

Problem

Research questions and friction points this paper is trying to address.

Evaluating de-identification robustness via re-identification attacks

Improving attack strength through ordered aggregation strategies

Enhancing re-identification using reasoning models with background knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregating predictions across multiple re-identification orderings

Using reasoning models to boost re-identification performance

Leveraging extensive background knowledge for stronger attacks

🔎 Similar Papers

Learning to Learn Transferable Generative Attack for Person Re-Identification