Tight Bounds for Answering Adaptively Chosen Concentrated Queries

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work investigates the utility limits of concentrated queries in adaptive data analysis under sample-correlated data. While existing frameworks suffer significant performance degradation on correlated data, we establish—for the first time—that, under natural assumptions, the number of adaptively answerable concentrated queries is inherently bounded by $O(n)$, drastically lower than the $O(n^2)$ bound achievable under independence; this gap is fundamental and unavoidable. We further design a simple, implementable algorithm that provably achieves this $O(n)$ bound, thereby attaining theoretical optimality. Our analysis integrates probabilistic reasoning, concentration inequalities, and formal modeling of adaptive query access, accompanied by rigorous complexity characterization. The results uncover a fundamental limitation imposed by data correlation on adaptive differentially private analysis, providing a critical theoretical benchmark for future framework design.

Technology Category

Application Category

📝 Abstract

Most work on adaptive data analysis assumes that samples in the dataset are independent. When correlations are allowed, even the non-adaptive setting can become intractable, unless some structural constraints are imposed. To address this, Bassily and Freund [2016] introduced the elegant framework of concentrated queries, which requires the analyst to restrict itself to queries that are concentrated around their expected value. While this assumption makes the problem trivial in the non-adaptive setting, in the adaptive setting it remains quite challenging. In fact, all known algorithms in this framework support significantly fewer queries than in the independent case: At most $O(n)$ queries for a sample of size $n$, compared to $O(n^2)$ in the independent setting. In this work, we prove that this utility gap is inherent under the current formulation of the concentrated queries framework, assuming some natural conditions on the algorithm. Additionally, we present a simplified version of the best-known algorithms that match our impossibility result.

Problem

Research questions and friction points this paper is trying to address.

Bounds for answering adaptively chosen concentrated queries

Addressing intractability with correlated dataset samples

Utility gap in concentrated queries framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concentrated queries framework for adaptive analysis

Imposes structural constraints on correlated datasets

Simplified algorithms matching theoretical bounds

🔎 Similar Papers

One Model, Any Conjunctive Query: Graph Neural Networks for Answering Complex Queries over Knowledge Graphs