๐ค AI Summary
Frequent data updates cause provenance sketches to rapidly become stale, while full reconstruction incurs prohibitively high maintenance overhead, hindering their deployment in dynamic environments. To address this, we propose IMP, an in-memory incremental maintenance frameworkโthe first to enable efficient incremental updates of provenance sketches. Leveraging the coarse-grained nature of sketches, IMP introduces a lightweight incremental query engine that avoids recomputation and full rebuilds. It synergistically integrates in-memory computation, incremental processing, and data-skipping optimizations to drastically reduce maintenance cost without compromising query accuracy. Experimental results demonstrate that IMP reduces sketch update latency by one to two orders of magnitude and sustains high-frequency updates under realistic workloads, thereby conferring practical scalability to provenance sketches.
๐ Abstract
Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this work, we introduce In-Memory incremental Maintenance of Provenance sketches (IMP), a framework for maintaining sketches incrementally under updates. At the core of IMP is an incremental query engine for data annotated with sketches that exploits the coarse-grained nature of sketches to enable novel optimizations. We experimentally demonstrate that IMP significantly reduces the cost of sketch maintenance, thereby enabling the use of provenance sketches for a broad range of workloads that involve updates.