Training Neural Networks on Data Sources with Unknown Reliability

📅 2022-12-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of unknown source reliability and low-quality data interference in multi-source heterogeneous learning, this paper proposes an unsupervised dynamic reweighting framework. Methodologically: (1) it treats source labels as implicit quality signals and introduces a likelihood-based temperature scaling mechanism for self-supervised reliability estimation; (2) it designs a progressive weight scheduling strategy—initially employing balanced sampling during a warm-up phase, followed by source-aware gradient scaling in later stages. The key innovation lies in the first-ever use of source labels for unsupervised reliability awareness and the realization of end-to-end differentiable dynamic weighting. Extensive experiments on mixed reliable/unreliable data demonstrate accuracy gains of 3.2–7.8% over baselines, while maintaining performance on clean, fully reliable data—confirming the framework’s robustness and generalizability.
📝 Abstract
When data is generated by multiple sources, conventional training methods update models assuming equal reliability for each source and do not consider their individual data quality. However, in many applications, sources have varied levels of reliability that can have negative effects on the performance of a neural network. A key issue is that often the quality of the data for individual sources is not known during training. Previous methods for training models in the presence of noisy data do not make use of the additional information that the source label can provide. Focusing on supervised learning, we aim to train neural networks on each data source for a number of steps proportional to the source's estimated reliability by using a dynamic re-weighting strategy motivated by likelihood tempering. This way, we allow training on all sources during the warm-up and reduce learning on less reliable sources during the final training stages, when it has been shown that models overfit to noise. We show through diverse experiments that this can significantly improve model performance when trained on mixtures of reliable and unreliable data sources, and maintain performance when models are trained on reliable sources only.
Problem

Research questions and friction points this paper is trying to address.

Neural Networks
Data Reliability
Noisy Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Weight Adjustment
Reliability-based Optimization
Quality Mixed Data Performance
🔎 Similar Papers
No similar papers found.
A
Alexander Capstick
Department of Brain Sciences, Imperial College London; Care Research and Technology Centre, UK Dementia Research Institute
Francesca Palermo
Francesca Palermo
Computer Vision Research Associate - EssilorLuxottica
Computer VisionMachine LearningAI in MedicineHaptic ExplorationSignal Processing
T
Tianyu Cui
Department of Brain Sciences, Imperial College London; Care Research and Technology Centre, UK Dementia Research Institute
P
P. Barnaghi
Department of Brain Sciences, Imperial College London; Care Research and Technology Centre, UK Dementia Research Institute; Data Research, Innovation and Virtual Environments (DRIVE) Unit, The Great Ormond Street Hospital