Training Neural Networks on Data Sources with Unknown Reliability

📅 2022-12-06

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the challenges of unknown source reliability and low-quality data interference in multi-source heterogeneous learning, this paper proposes an unsupervised dynamic reweighting framework. Methodologically: (1) it treats source labels as implicit quality signals and introduces a likelihood-based temperature scaling mechanism for self-supervised reliability estimation; (2) it designs a progressive weight scheduling strategy—initially employing balanced sampling during a warm-up phase, followed by source-aware gradient scaling in later stages. The key innovation lies in the first-ever use of source labels for unsupervised reliability awareness and the realization of end-to-end differentiable dynamic weighting. Extensive experiments on mixed reliable/unreliable data demonstrate accuracy gains of 3.2–7.8% over baselines, while maintaining performance on clean, fully reliable data—confirming the framework’s robustness and generalizability.

📝 Abstract

When data is generated by multiple sources, conventional training methods update models assuming equal reliability for each source and do not consider their individual data quality. However, in many applications, sources have varied levels of reliability that can have negative effects on the performance of a neural network. A key issue is that often the quality of the data for individual sources is not known during training. Previous methods for training models in the presence of noisy data do not make use of the additional information that the source label can provide. Focusing on supervised learning, we aim to train neural networks on each data source for a number of steps proportional to the source's estimated reliability by using a dynamic re-weighting strategy motivated by likelihood tempering. This way, we allow training on all sources during the warm-up and reduce learning on less reliable sources during the final training stages, when it has been shown that models overfit to noise. We show through diverse experiments that this can significantly improve model performance when trained on mixtures of reliable and unreliable data sources, and maintain performance when models are trained on reliable sources only.

Problem

Research questions and friction points this paper is trying to address.

Neural Networks

Data Reliability

Noisy Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Weight Adjustment

Reliability-based Optimization

Quality Mixed Data Performance

🔎 Similar Papers

Calibration in Deep Learning: A Survey of the State-of-the-Art