🤖 AI Summary
Data quality definitions have long suffered from multidimensionality and conceptual inconsistency, necessitating systematic synthesis to establish a unified theoretical framework. This study introduces Feature-Oriented Domain Analysis (FODA) to data quality research for the first time, integrating a Systematic Literature Review (SLR) with quality dimension modeling to construct the first structured taxonomy encompassing mainstream definitions. We identify and clarify 12 core quality dimensions and their semantic relationships, proposing a novel four-level, feature-oriented taxonomy that significantly enhances definitional comparability and theoretical coherence. Our analysis reveals three critical research gaps: (1) lack of understanding of dynamic dimension evolution, (2) insufficient cross-domain semantic alignment, and (3) weak empirical validation. The resulting taxonomy provides a scalable, theoretically grounded foundation for data quality assessment, standardization, and tool development.
📝 Abstract
The digital transformation of our society is a constant challenge, as data is generated in almost every digital interaction. To use data effectively, it must be of high quality. This raises the question: what exactly is data quality? A systematic literature review of the existing literature shows that data quality is a multifaceted concept, characterized by a number of quality dimensions. However, the definitions of data quality vary widely. We used feature-oriented domain analysis to specify a taxonomy of data quality definitions and to classify the existing definitions. This allows us to identify research gaps and future topics.