Conformal-in-the-Loop for Learning with Imbalanced Noisy Data

📅 2024-11-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

To address the degradation of model performance caused by the coexistence of class imbalance and label noise in real-world scenarios, this paper proposes CitL, a closed-loop conformal learning framework. CitL is the first method to embed conformal prediction into the model training loop, enabling end-to-end differentiable uncertainty quantification for dynamic sample reliability assessment. It achieves adaptive sample reweighting and confidence-driven pruning, jointly mitigating the effects of both label noise and class imbalance—without requiring auxiliary data cleaning or resampling modules. Extensive experiments demonstrate that CitL improves classification accuracy by up to 6.1% and semantic segmentation mIoU by 5.0%, significantly outperforming existing denoising and balancing approaches. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Class imbalance and label noise are pervasive in large-scale datasets, yet much of machine learning research assumes well-labeled, balanced data, which rarely reflects real world conditions. Existing approaches typically address either label noise or class imbalance in isolation, leading to suboptimal results when both issues coexist. In this work, we propose Conformal-in-the-Loop (CitL), a novel training framework that addresses both challenges with a conformal prediction-based approach. CitL evaluates sample uncertainty to adjust weights and prune unreliable examples, enhancing model resilience and accuracy with minimal computational cost. Our extensive experiments include a detailed analysis showing how CitL effectively emphasizes impactful data in noisy, imbalanced datasets. Our results show that CitL consistently boosts model performance, achieving up to a 6.1% increase in classification accuracy and a 5.0 mIoU improvement in segmentation. Our code is publicly available: CitL.

Problem

Research questions and friction points this paper is trying to address.

Imbalanced Datasets

Label Noise

Machine Learning Performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal-in-the-Loop

Imbalanced Datasets

Label Noise

🔎 Similar Papers

Unsupervised Machine Learning Hybrid Approach Integrating Linear Programming in Loss Function: A Robust Optimization Technique