Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study demonstrates that publicly released qualitative interview datasets are vulnerable to re-identification attacks in the era of large language models (LLMs). For the first time, it shows that general-purpose LLM agents can perform complex, automated re-identification with low technical barriers: by using natural language prompts to guide embodied LLM agents, the approach integrates web search, information extraction, and multi-hop reasoning, decomposing the attack into seemingly innocuous subtasks. Applied to the Anthropic Interviewer dataset, this method successfully re-identified 6 out of 24 interviewed scientists and linked them to their specific research outputs, achieving partial one-to-one matches. These findings expose the fragility of current privacy-preserving mechanisms against intelligent agent-based inference.

Technology Category

Application Category

📝 Abstract

On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, including 125 scientists, about their use of AI for research. Focusing on the scientist subset, I show that widely available LLMs with web search and agentic capabilities can link six out of twenty-four interviews to specific scientific works, recovering associated authors and, in some cases, uniquely identifying the interviewees. My contribution is to show that modern LLM-based agents make such re-identification attacks easy and low-effort: off-the-shelf tools can, with a few natural-language prompts, search the web, cross-reference details, and propose likely matches, effectively lowering the technical barrier. Existing safeguards can be bypassed by breaking down the re-identification into benign tasks. I outline the attack at a high level, discuss implications for releasing rich qualitative data in the age of LLM agents, and propose mitigation recommendations and open problems. I have notified Anthropic of my findings.

Problem

Research questions and friction points this paper is trying to address.

deanonymization

re-identification

LLM agents

privacy

qualitative data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic LLMs

Re-identification

Deanonymization