Clustering with Non-adaptive Subset Queries

📅 2024-09-17
🏛️ Neural Information Processing Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies implicit clustering recovery under non-adaptive subset queries: queries must be fixed in advance without dependence on intermediate outcomes, and each query returns the number of clusters intersecting a given subset—replacing pairwise queries to break the Ω(n²) lower bound. We propose the first non-adaptive algorithmic framework based on multi-scale coding and hierarchical estimation, accommodating practical constraints such as bounded query size and balanced clusters. Theoretically, it achieves O(n log k (log k + log log n)²) query complexity, improving to O(n log k) for balanced clusters; the information-theoretic lower bound is Ω(max(n²/s², n)), substantially better than the naive O(n²). Our core innovation lies in integrating combinatorial coding, hierarchical hash sampling, and a two-round adaptive relaxation technique into a fully non-adaptive design—enabling, for the first time, accurate clustering recovery with subquadratic query complexity.

Technology Category

Application Category

📝 Abstract
Recovering the underlying clustering of a set $U$ of $n$ points by asking pair-wise same-cluster queries has garnered significant interest in the last decade. Given a query $S subset U$, $|S|=2$, the oracle returns yes if the points are in the same cluster and no otherwise. For adaptive algorithms with pair-wise queries, the number of required queries is known to be $Theta(nk)$, where $k$ is the number of clusters. However, non-adaptive schemes require $Omega(n^2)$ queries, which matches the trivial $O(n^2)$ upper bound attained by querying every pair of points. To break the quadratic barrier for non-adaptive queries, we study a generalization of this problem to subset queries for $|S|>2$, where the oracle returns the number of clusters intersecting $S$. Allowing for subset queries of unbounded size, $O(n)$ queries is possible with an adaptive scheme (Chakrabarty-Liao, 2024). However, the realm of non-adaptive algorithms is completely unknown. In this paper, we give the first non-adaptive algorithms for clustering with subset queries. Our main result is a non-adaptive algorithm making $O(n log k cdot (log k + loglog n)^2)$ queries, which improves to $O(n log log n)$ when $k$ is a constant. We also consider algorithms with a restricted query size of at most $s$. In this setting we prove that $Omega(max(n^2/s^2,n))$ queries are necessary and obtain algorithms making $ ilde{O}(n^2k/s^2)$ queries for any $s leq sqrt{n}$ and $ ilde{O}(n^2/s)$ queries for any $s leq n$. We also consider the natural special case when the clusters are balanced, obtaining non-adaptive algorithms which make $O(n log k) + ilde{O}(k)$ and $O(nlog^2 k)$ queries. Finally, allowing two rounds of adaptivity, we give an algorithm making $O(n log k)$ queries in the general case and $O(n log log k)$ queries when the clusters are balanced.
Problem

Research questions and friction points this paper is trying to address.

Non-adaptive clustering with subset queries beyond pairwise limitations
Breaking quadratic query barrier for non-adaptive clustering algorithms
Developing efficient subset query strategies for cluster recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses non-adaptive subset queries for clustering
Employs queries returning cluster intersection counts
Achieves O(n log k) queries with two adaptivity rounds
🔎 Similar Papers
No similar papers found.