Unfolding Data Quality Dimensions in Practice: A Survey

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A significant gap exists between theoretical definitions of data quality dimensions—such as accuracy, completeness, consistency, and timeliness—as stipulated in standards (e.g., ISO/IEC 25012) and the actual functionalities implemented in widely used data quality tools, with no systematic mapping analysis to date. Method: We conducted a systematic literature review and performed functional reverse engineering on seven mainstream open-source data quality tools. Contribution/Results: We present the first many-to-many mapping framework linking data quality dimensions to concrete tool capabilities, introducing a cross-dimensional, fine-grained functional categorization schema and generating a structured correspondence matrix covering all core dimensions. This work bridges the theory–practice divide, delivers an actionable guide for data quality assessment, and substantially enhances the scientific rigor and reusability of tool selection, functionality design, and standard implementation.

Technology Category

Application Category

📝 Abstract
Data quality describes the degree to which data meet specific requirements and are fit for use by humans and/or downstream tasks (e.g., artificial intelligence). Data quality can be assessed across multiple high-level concepts called dimensions, such as accuracy, completeness, consistency, or timeliness. While extensive research and several attempts for standardization (e.g., ISO/IEC 25012) exist for data quality dimensions, their practical application often remains unclear. In parallel to research endeavors, a large number of tools have been developed that implement functionalities for the detection and mitigation of specific data quality issues, such as missing values or outliers. With this paper, we aim to bridge this gap between data quality theory and practice by systematically connecting low-level functionalities offered by data quality tools with high-level dimensions, revealing their many-to-many relationships. Through an examination of seven open-source data quality tools, we provide a comprehensive mapping between their functionalities and the data quality dimensions, demonstrating how individual functionalities and their variants partially contribute to the assessment of single dimensions. This systematic survey provides both practitioners and researchers with a unified view on the fragmented landscape of data quality checks, offering actionable insights for quality assessment across multiple dimensions.
Problem

Research questions and friction points this paper is trying to address.

Bridging gap between data quality theory and practice
Mapping tool functionalities to data quality dimensions
Providing unified view on fragmented quality checks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically connects low-level tool functionalities
Maps functionalities to high-level quality dimensions
Provides unified view on fragmented quality checks
🔎 Similar Papers
No similar papers found.