In-memory Incremental Maintenance of Provenance Sketches [extended version]

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Frequent data updates cause provenance sketches to rapidly become stale, while full reconstruction incurs prohibitively high maintenance overhead, hindering their deployment in dynamic environments. To address this, we propose IMP, an in-memory incremental maintenance framework—the first to enable efficient incremental updates of provenance sketches. Leveraging the coarse-grained nature of sketches, IMP introduces a lightweight incremental query engine that avoids recomputation and full rebuilds. It synergistically integrates in-memory computation, incremental processing, and data-skipping optimizations to drastically reduce maintenance cost without compromising query accuracy. Experimental results demonstrate that IMP reduces sketch update latency by one to two orders of magnitude and sustains high-frequency updates under realistic workloads, thereby conferring practical scalability to provenance sketches.

Technology Category

Application Category

📝 Abstract

Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this work, we introduce In-Memory incremental Maintenance of Provenance sketches (IMP), a framework for maintaining sketches incrementally under updates. At the core of IMP is an incremental query engine for data annotated with sketches that exploits the coarse-grained nature of sketches to enable novel optimizations. We experimentally demonstrate that IMP significantly reduces the cost of sketch maintenance, thereby enabling the use of provenance sketches for a broad range of workloads that involve updates.

Problem

Research questions and friction points this paper is trying to address.

Maintaining stale provenance sketches after data updates

Incremental sketch maintenance for query speed-up

Reducing sketch maintenance cost for updated workloads

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-memory incremental sketch maintenance framework

Exploits coarse-grained sketches for optimizations

Reduces cost of provenance sketch maintenance

🔎 Similar Papers

No similar papers found.