Maximum Coverage in Turnstile Streams with Applications to Fingerprinting Measures

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the dynamic maximum coverage problem under the turnstile streaming model: given a data stream with sliding-window updates, dynamically select $k$ subsets from $d$ candidates to maximize the size of their union. It further extends this framework to re-identification risk analysis by identifying high-risk fingerprint features. We propose the first streaming algorithm for maximum coverage supporting polylogarithmic-in-$n$ update time, integrating frequency moment estimation (for $p geq 2$), sketch-based compression, streaming hash functions, and hierarchical sampling. We also develop two risk identification frameworks—targeted (leveraging known target fingerprints) and generic (requiring no prior knowledge)—both achieving theoretically optimal approximation ratios. Empirically, our method accelerates fingerprint identification by up to 210× over prior approaches, while ensuring rigorous theoretical guarantees and practical deployability.

Technology Category

Application Category

📝 Abstract
In the maximum coverage problem we are given $d$ subsets from a universe $[n]$, and the goal is to output $k$ subsets such that their union covers the largest possible number of distinct items. We present the first algorithm for maximum coverage in the turnstile streaming model, where updates which insert or delete an item from a subset come one-by-one. Notably our algorithm only uses $polylog n$ update time. We also present turnstile streaming algorithms for targeted and general fingerprinting for risk management where the goal is to determine which features pose the greatest re-identification risk in a dataset. As part of our work, we give a result of independent interest: an algorithm to estimate the complement of the $p^{ ext{th}}$ frequency moment of a vector for $p geq 2$. Empirical evaluation confirms the practicality of our fingerprinting algorithms demonstrating a speedup of up to $210$x over prior work.
Problem

Research questions and friction points this paper is trying to address.

Solving maximum coverage in turnstile streaming model
Developing fingerprinting algorithms for risk management
Estimating complement of p-th frequency moment efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

First turnstile streaming algorithm for maximum coverage
Uses polylogarithmic update time for efficiency
Estimates complement of p-th frequency moment
🔎 Similar Papers
No similar papers found.