🤖 AI Summary
High-dimensional pattern matching with wildcards suffers from poor query efficiency—existing linear-space data structures incur query time $n^c$, which is prohibitively expensive.
Method: We establish a generic reduction framework linking data structure design to communication complexity, systematically transforming linear-space high-dimensional pattern matching into the analysis of unambiguous Arthur–Merlin (UAM) communication complexity under product distributions. Crucially, we identify intrinsic data sparsity as a fundamental lever for reducing both communication and query complexity, enabling dimension-independent upper bounds on UAM complexity. Leveraging a one-sided-error Set-Disjointness protocol and sublinear communication techniques, we construct a linear-space data structure.
Results: Our approach achieves wildcard matching query time $n^{1-1/(clog^2 c)}$, breaking the classical lower bound from STOC’04 and marking the first asymptotic improvement for this problem under linear space.
📝 Abstract
We present a general framework for designing efficient data structures for high-dimensional pattern-matching problems ($exists ;? iin[n], f(x_i,y)=1$) through communication models in which $f(x,y)$ admits sublinear communication protocols with exponentially-small error. Specifically, we reduce the data structure problem to the Unambiguous Arthur-Merlin (UAM) communication complexity of $f(x,y)$ under product distributions.
We apply our framework to the Partial Match problem (a.k.a, matching with wildcards), whose underlying communication problem is sparse set-disjointness. When the database consists of $n$ points in dimension $d$, and the number of $star$'s in the query is at most $w = clog n ;(ll d)$, the fastest known linear-space data structure (Cole, Gottlieb and Lewenstein, STOC'04) had query time $t approx 2^w = n^c$, which is nontrivial only when $c<1$. By contrast, our framework produces a data structure with query time $n^{1-1/(c log^2 c)}$ and space close to linear.
To achieve this, we develop a one-sided $ε$-error communication protocol for Set-Disjointness under product distributions with $ ildeΘ(sqrt{dlog(1/ε)})$ complexity, improving on the classical result of Babai, Frankl and Simon (FOCS'86). Building on this protocol, we show that the Unambiguous AM communication complexity of $w$-Sparse Set-Disjointness with $ε$-error under product distributions is $ ilde{O}(sqrt{w log(1/ε)})$, independent of the ambient dimension $d$, which is crucial for the Partial Match result. Our framework sheds further light on the power of data-dependent data structures, which is instrumental for reducing to the (much easier) case of product distributions.