🤖 AI Summary
Data quality (DQ) rule definition in data warehouses remains heavily manual, resulting in low efficiency and high operational costs. Method: We systematically evaluated 151 industrial-grade DQ tools and conducted a comprehensive literature review to assess AI-enabled capabilities for automated DQ rule discovery in warehouse environments. Contribution/Results: Our analysis quantitatively reveals that only 10 tools exhibit preliminary AI-driven DQ rule detection capabilities—highlighting a significant gap in both industry practice and academic research. To address this, we propose the “AI for DQ Management” paradigm, shifting DQ governance from manual rule specification toward AI-autonomous rule discovery. We introduce a capability mapping matrix and a cross-platform functional comparison framework to precisely identify critical technical bottlenecks. This work provides empirically grounded guidance for organizational tool selection and outlines a research and development roadmap for next-generation, AI-native DQ governance systems.
📝 Abstract
In the contemporary data-driven landscape, ensuring data quality (DQ) is crucial for deriving actionable insights from vast data repositories. The objective of this study is to explore the potential for automating data quality management within data warehouses as data repository commonly used by large organizations. By conducting a systematic review of existing DQ tools available in the market and academic literature, the study assesses their capability to automatically detect and enforce data quality rules. The review encompassed 151 tools from various sources, revealing that most current tools focus on data cleansing and fixing in domain-specific databases rather than data warehouses. Only a limited number of tools, specifically ten, demonstrated the capability to detect DQ rules, not to mention implementing this in data warehouses. The findings underscore a significant gap in the market and academic research regarding AI-augmented DQ rule detection in data warehouses. This paper advocates for further development in this area to enhance the efficiency of DQ management processes, reduce human workload, and lower costs. The study highlights the necessity of advanced tools for automated DQ rule detection, paving the way for improved practices in data quality management tailored to data warehouse environments. The study can guide organizations in selecting data quality tool that would meet their requirements most.