In-memory Incremental Maintenance of Provenance Sketches [extended version]

๐Ÿ“… 2025-05-27
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Frequent data updates cause provenance sketches to rapidly become stale, while full reconstruction incurs prohibitively high maintenance overhead, hindering their deployment in dynamic environments. To address this, we propose IMP, an in-memory incremental maintenance frameworkโ€”the first to enable efficient incremental updates of provenance sketches. Leveraging the coarse-grained nature of sketches, IMP introduces a lightweight incremental query engine that avoids recomputation and full rebuilds. It synergistically integrates in-memory computation, incremental processing, and data-skipping optimizations to drastically reduce maintenance cost without compromising query accuracy. Experimental results demonstrate that IMP reduces sketch update latency by one to two orders of magnitude and sustains high-frequency updates under realistic workloads, thereby conferring practical scalability to provenance sketches.

Technology Category

Application Category

๐Ÿ“ Abstract
Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this work, we introduce In-Memory incremental Maintenance of Provenance sketches (IMP), a framework for maintaining sketches incrementally under updates. At the core of IMP is an incremental query engine for data annotated with sketches that exploits the coarse-grained nature of sketches to enable novel optimizations. We experimentally demonstrate that IMP significantly reduces the cost of sketch maintenance, thereby enabling the use of provenance sketches for a broad range of workloads that involve updates.
Problem

Research questions and friction points this paper is trying to address.

Maintaining stale provenance sketches after data updates
Incremental sketch maintenance for query speed-up
Reducing sketch maintenance cost for updated workloads
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-memory incremental sketch maintenance framework
Exploits coarse-grained sketches for optimizations
Reduces cost of provenance sketch maintenance
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Pengyuan Li
Illinois Institute of Technology, USA
Boris Glavic
Boris Glavic
Associate Professor, University of Illinois at Chicago
DatabasesUncertaintyData ScienceData ProvenanceData Integration
D
Dieter Gawlick
Oracle Corporation, USA
V
Vasudha Krishnaswamy
Oracle Corporation, USA
Z
Zhen Hua Liu
Oracle Corporation, USA
Danica Porobic
Danica Porobic
Oracle
Database Management Systems
X
Xing Niu
Oracle Corporation, USA