End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches treat data quality assessment and machine learning systems as disjoint components, hindering dynamic, real-time coordination in production environments. This paper proposes the first end-to-end, quality-driven framework tailored for industrial MLOps, achieving the first closed-loop integration of data quality evaluation and model inference. The framework introduces theoretically grounded yet engineering-practical mechanisms: dynamic distribution drift detection, adaptive multi-dimensional quality metrics, a lightweight inference pipeline, and configurable quality thresholding. Evaluated on an industrial steelmaking ESR vacuum pump process, it achieves a model R² of 94%—a 12-percentage-point improvement—and reduces prediction latency by 75%, enabling millisecond-level quality-aware decision-making.

Technology Category

Application Category

📝 Abstract
This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments. While existing approaches treat data quality assessment and ML systems as isolated processes, our framework addresses the critical gap between theoretical methods and practical implementation by combining dynamic drift detection, adaptive data quality metrics, and MLOps into a cohesive, lightweight system. The key innovation lies in its operational efficiency, enabling real-time, quality-driven ML decision-making with minimal computational overhead. We validate the framework in a steel manufacturing company's Electroslag Remelting (ESR) vacuum pumping process, demonstrating a 12% improvement in model performance (R2 = 94%) and a fourfold reduction in prediction latency. By exploring the impact of data quality acceptability thresholds, we provide actionable insights into balancing data quality standards and predictive performance in industrial applications. This framework represents a significant advancement in MLOps, offering a robust solution for time-sensitive, data-driven decision-making in dynamic industrial environments.
Problem

Research questions and friction points this paper is trying to address.

Integrates data quality with real-time ML operations
Bridges gap between theory and practice in MLOps
Improves model performance and reduces latency in industry
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates data quality assessment with ML model operations
Combines dynamic drift detection and adaptive quality metrics
Enables real-time quality-driven ML decision-making efficiently
🔎 Similar Papers
No similar papers found.