SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of “state explosion” and stringent latency constraints in multi-pattern sequence matching for systems such as Complex Event Processing (CEP), Online Analytical Processing (OLAP), and Retrieval-Augmented Generation (RAG), this paper introduces Pattern Sharing Degree (PSD)—a novel abstraction that systematically models and exploits cross-pattern state sharing for the first time. Leveraging PSD, we design a scheduling mechanism supporting runtime partial-match classification indexing and constant-time critical-path resumption after timeout, tightly integrating state sharing with state reduction techniques. Experimental results demonstrate that, under an average latency budget of only 50%, our approach achieves 97%, 96%, and 73% recall for CEP, OLAP, and RAG workloads, respectively—significantly outperforming isolated pattern processing baselines.

Technology Category

Application Category

📝 Abstract
The detection of sequential patterns in data is a basic functionality of modern data processing systems for complex event processing (CEP), OLAP, and retrieval-augmented generation (RAG). In practice, pattern matching is challenging, since common applications rely on a large set of patterns that shall be evaluated with tight latency bounds. At the same time, matching needs to maintain state, i.e., intermediate results, that grows exponentially in the input size. Hence, systems turn to best-effort processing, striving for maximal recall under a latency bound. Existing techniques, however, consider each pattern in isolation, neglecting the optimization potential induced by state sharing in pattern matching. In this paper, we present SHARP, a library that employs state reduction to achieve efficient best-effort pattern matching. To this end, SHARP incorporates state sharing between patterns through a new abstraction, coined pattern-sharing degree (PSD). At runtime, this abstraction facilitates the categorization and indexing of partial pattern matches. Based thereon, once a latency bound is exceeded, SHARP realizes best-effort processing by selecting a subset of partial matches for further processing in constant time. In experiments with real-world data, SHARP achieves a recall of 97%, 96% and 73% for pattern matching in CEP, OLAP, and RAG applications, under a bound of 50% of the average processing latency.
Problem

Research questions and friction points this paper is trying to address.

Efficient matching of sequential patterns in data processing systems
Reducing exponential state growth in pattern matching
Optimizing state sharing between patterns for latency bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

State sharing via pattern-sharing degree (PSD)
Constant-time subset selection for latency bounds
Efficient best-effort pattern matching with high recall
🔎 Similar Papers
2024-03-05IEEE transactions on circuits and systems for video technology (Print)Citations: 0