🤖 AI Summary
This study addresses the lack of systematic evaluation of data quality tools with respect to their measurement capabilities and integration with large language models (LLMs). It presents the first multidimensional assessment framework grounded in real-world enterprise use cases, systematically evaluating six prominent tools—including open-source solutions such as Great Expectations and Deequ, as well as commercial platforms like Informatica and Experian—across dimensions including rule definition, duplicate detection, metric aggregation, and uncertainty handling, along with their LLM integration mechanisms. The findings reveal that commercial tools offer more comprehensive functionality and初步 support for LLM-assisted rule generation, whereas open-source tools provide greater flexibility at the cost of higher implementation effort. Notably, none of the evaluated tools currently enable direct LLM-based data validation. This work provides empirical guidance for selecting data quality tools and advancing their integration with LLMs.
📝 Abstract
High data quality is critical for reliable analytics and operational efficiency. A growing ecosystem of tools has emerged to support data quality management, ranging from lightweight open-source libraries to comprehensive enterprise platforms. This paper evaluates six data quality tools: Great Expectations, Deequ, Evidently, Informatica, Experian, and Ataccama. The evaluation criteria cover rule definition, duplicate detection, metric aggregation, and uncertainty handling, and were derived from real-world use cases of company partners. We further examine to what extent these tools integrate Large Language Models (LLMs). Our findings show that proprietary tools offer more comprehensive measurement features and emerging LLM-based assistance, while open-source tools provide flexibility at the cost of higher implementation effort. Across all tools, LLM integration remains limited to rule creation workflows. Direct data validation through LLMs is not yet supported by any of the evaluated tools.