Assessing Data Augmentation-Induced Bias in Training and Testing of Machine Learning Models

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work identifies an evaluation bias introduced by data augmentation (e.g., SMOTE, mutation-based augmentation) in scarce-data scenarios—particularly flaky test classification—where augmented samples inadvertently contaminate the test set, severely compromising fairness and reliability assessments. To address this, the authors first empirically identify and validate the critical phenomenon that “augmented data participation in testing” induces systematic evaluation distortion. They then propose a detection framework capable of disentangling training-induced bias from evaluation-induced bias, and design a bias-calibrated evaluation protocol. Experiments across multiple flaky-test benchmark datasets demonstrate that test sets containing augmented samples inflate accuracy by up to 23.7% and introduce F1-score deviations exceeding 0.15. This study establishes both theoretical foundations and practical guidelines for trustworthy model evaluation under data augmentation.

Technology Category

Application Category

📝 Abstract

Data augmentation has become a standard practice in software engineering to address limited or imbalanced data sets, particularly in specialized domains like test classification and bug detection where data can be scarce. Although techniques such as SMOTE and mutation-based augmentation are widely used in software testing and debugging applications, a rigorous understanding of how augmented training data impacts model bias is lacking. It is especially critical to consider bias in scenarios where augmented data sets are used not just in training but also in testing models. Through a comprehensive case study of flaky test classification, we demonstrate how to test for bias and understand the impact that the inclusion of augmented samples in testing sets can have on model evaluation.

Problem

Research questions and friction points this paper is trying to address.

Impact of data augmentation on model bias

Bias in training and testing with augmented data

Evaluating augmented data effects in model testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data augmentation techniques used

Bias impact in model testing

Flaky test classification case

🔎 Similar Papers

Data augmentation with automated machine learning: approaches and performance comparison with classical data augmentation methods