A Framework for Building Data Structures from Communication Protocols

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

High-dimensional pattern matching with wildcards suffers from poor query efficiency—existing linear-space data structures incur query time $n^c$, which is prohibitively expensive. Method: We establish a generic reduction framework linking data structure design to communication complexity, systematically transforming linear-space high-dimensional pattern matching into the analysis of unambiguous Arthur–Merlin (UAM) communication complexity under product distributions. Crucially, we identify intrinsic data sparsity as a fundamental lever for reducing both communication and query complexity, enabling dimension-independent upper bounds on UAM complexity. Leveraging a one-sided-error Set-Disjointness protocol and sublinear communication techniques, we construct a linear-space data structure. Results: Our approach achieves wildcard matching query time $n^{1-1/(clog^2 c)}$, breaking the classical lower bound from STOC’04 and marking the first asymptotic improvement for this problem under linear space.

Technology Category

Application Category

📝 Abstract

We present a general framework for designing efficient data structures for high-dimensional pattern-matching problems ($exists ;? iin[n], f(x_i,y)=1$) through communication models in which $f(x,y)$ admits sublinear communication protocols with exponentially-small error. Specifically, we reduce the data structure problem to the Unambiguous Arthur-Merlin (UAM) communication complexity of $f(x,y)$ under product distributions. We apply our framework to the Partial Match problem (a.k.a, matching with wildcards), whose underlying communication problem is sparse set-disjointness. When the database consists of $n$ points in dimension $d$, and the number of $star$'s in the query is at most $w = clog n ;(ll d)$, the fastest known linear-space data structure (Cole, Gottlieb and Lewenstein, STOC'04) had query time $t approx 2^w = n^c$, which is nontrivial only when $c<1$. By contrast, our framework produces a data structure with query time $n^{1-1/(c log^2 c)}$ and space close to linear. To achieve this, we develop a one-sided $ε$-error communication protocol for Set-Disjointness under product distributions with $ ildeΘ(sqrt{dlog(1/ε)})$ complexity, improving on the classical result of Babai, Frankl and Simon (FOCS'86). Building on this protocol, we show that the Unambiguous AM communication complexity of $w$-Sparse Set-Disjointness with $ε$-error under product distributions is $ ilde{O}(sqrt{w log(1/ε)})$, independent of the ambient dimension $d$, which is crucial for the Partial Match result. Our framework sheds further light on the power of data-dependent data structures, which is instrumental for reducing to the (much easier) case of product distributions.

Problem

Research questions and friction points this paper is trying to address.

Design efficient data structures for high-dimensional pattern-matching problems

Improve query time and space for Partial Match problem

Develop communication protocols for Set-Disjointness under product distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reduces data structures to UAM communication complexity

Improves Set-Disjointness protocol with sublinear complexity

Achieves dimension-independent communication for sparse problems

🔎 Similar Papers

No similar papers found.