"Detective Work We Shouldn't Have to Do": Practitioner Challenges in Regulatory-Aligned Data Quality in Machine Learning Systems

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study addresses the compliance challenges faced by data practitioners in machine learning systems under regulations such as the GDPR and the AI Act, particularly concerning data quality. Through semi-structured interviews with practitioners in the European Union, combined with thematic analysis of regulatory texts and engineering workflows, the research systematically uncovers a structural disconnect between regulation-driven data quality requirements and ML engineering practices. It identifies five core challenges: misalignment between legal principles and engineering implementation, fragmented data pipelines, lack of purpose-built compliance tools, ambiguous accountability, and reactive responses to audits. Building on these findings, the work proposes directions for designing compliance-oriented tooling, establishing effective governance mechanisms, and fostering cultural transformation to bridge the gap between regulatory mandates and practical ML development.

Technology Category

Application Category

📝 Abstract

Ensuring data quality in machine learning (ML) systems has become increasingly complex as regulatory requirements expand. In the European Union (EU), frameworks such as the General Data Protection Regulation (GDPR) and the Artificial Intelligence Act (AI Act) articulate data quality requirements that closely parallel technical concerns in ML practice, while also extending to legal obligations related to accountability, risk management, and human rights protection. This paper presents a qualitative interview study with EU-based data practitioners working on ML systems in regulated contexts. Through semi-structured interviews, we investigate how practitioners interpret regulatory-aligned data quality, the challenges they encounter, and the supports they identify as necessary. Our findings reveal persistent gaps between legal principles and engineering workflows, fragmentation across data pipelines, limitations of existing tools, unclear responsibility boundaries between technical and legal teams, and a tendency toward reactive, audit-driven quality practices. We also identify practitioners'needs for compliance-aware tooling, clearer governance structures, and cultural shifts toward proactive data governance.

Problem

Research questions and friction points this paper is trying to address.

regulatory compliance

data quality

machine learning systems

GDPR

AI Act

Innovation

Methods, ideas, or system contributions that make the work stand out.

regulatory-aligned data quality

machine learning systems

GDPR