Query-Guided Analysis and Mitigation of Data Verification Errors (Extended Version)

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the propagation of label errors in data validation, which can severely compromise the reliability of downstream query results. To quantify the impact of such errors and identify high-risk tuples whose uncertainty may be exacerbated by validation, the authors propose Maximum Error Score (MES)—a data-distribution-agnostic metric. Building on MES, they design MESReduce, an interactive validation optimization algorithm that adaptively guides the verification process by efficiently computing MES and incorporating feedback from external validators. Experimental evaluation on both real-world and synthetic datasets demonstrates that MESReduce significantly reduces the maximum error score and effectively enhances validation accuracy.

Technology Category

Application Category

📝 Abstract
Data verification, the process of labeling data items as correct or incorrect, is a preprocessing step that may critically affect the quality of results in data-driven pipelines. Despite recent advances, verification can still produce erroneous labels that propagate to downstream query results in complex ways. We present a framework that complements existing verification tools by assessing the impact of potential labeling errors on query outputs and guiding additional verification steps to improve result reliability. To this end, we introduce Maximal Error Score (MES), a worst-case uncertainty metric that quantifies the reliability of query output tuples independently of the underlying data distribution. As an auxiliary indicator, we identify risky tuples - input tuples for which reducing label uncertainty may counterintuitively increase the output uncertainty. We then develop efficient algorithms for computing MES and detecting risky tuples, as well as a generic algorithm, named MESReduce, that builds on both indicators and interacts with external verifiers to select effective additional verification steps. We implement our techniques in a prototype system and evaluate them on real and synthetic datasets, demonstrating that MESReduce can substantially and effectively reduce the MES and improve the accuracy of verification results.
Problem

Research questions and friction points this paper is trying to address.

data verification
labeling errors
query reliability
error propagation
uncertainty quantification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximal Error Score
Data Verification
Query-Guided Mitigation
Risky Tuples
Uncertainty Quantification
🔎 Similar Papers
No similar papers found.
R
Ran Schreiber
The Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel
Yael Amsterdamer
Yael Amsterdamer
Professor, the Department of Computer Science, Bar-Ilan University
Data ManagementData ExplorationCrowdsourcing