Towards AI-Augmented Data Quality Management: From Data Quality for AI to AI for Data Quality Management

📅 2024-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data quality (DQ) rule definition in data warehouses remains heavily manual, resulting in low efficiency and high operational costs. Method: We systematically evaluated 151 industrial-grade DQ tools and conducted a comprehensive literature review to assess AI-enabled capabilities for automated DQ rule discovery in warehouse environments. Contribution/Results: Our analysis quantitatively reveals that only 10 tools exhibit preliminary AI-driven DQ rule detection capabilities—highlighting a significant gap in both industry practice and academic research. To address this, we propose the “AI for DQ Management” paradigm, shifting DQ governance from manual rule specification toward AI-autonomous rule discovery. We introduce a capability mapping matrix and a cross-platform functional comparison framework to precisely identify critical technical bottlenecks. This work provides empirically grounded guidance for organizational tool selection and outlines a research and development roadmap for next-generation, AI-native DQ governance systems.

Technology Category

Application Category

📝 Abstract
In the contemporary data-driven landscape, ensuring data quality (DQ) is crucial for deriving actionable insights from vast data repositories. The objective of this study is to explore the potential for automating data quality management within data warehouses as data repository commonly used by large organizations. By conducting a systematic review of existing DQ tools available in the market and academic literature, the study assesses their capability to automatically detect and enforce data quality rules. The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses. Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses. The findings underscore a significant gap in the market and academic research regarding AI-augmented DQ rule detection in data warehouses. This paper advocates for further development in this area to enhance the efficiency of DQ management processes, reduce human workload, and lower costs. The study highlights the necessity of advanced tools for automated DQ rule detection, paving the way for improved practices in data quality management tailored to data warehouse environments. The study can guide organizations in selecting data quality tool that would meet their requirements most.
Problem

Research questions and friction points this paper is trying to address.

Automating data quality management in data warehouses
Assessing AI-augmented DQ rule detection capabilities
Addressing market gap in AI-driven DQ tools
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-augmented DQ rule detection in data warehouses
Systematic review of 151 existing DQ tools
Advocates for automated DQ management efficiency
🔎 Similar Papers
No similar papers found.
H
Heidi Carolina Tamm
Swedbank Group, Tallinn, Estonia
Anastasija Nikiforova
Anastasija Nikiforova
University of Tartu, Tartu, Estonia