Filtering instances and rejecting predictions to obtain reliable models in healthcare

📅 2025-10-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In high-stakes domains such as healthcare, existing machine learning models often neglect predictive uncertainty, compromising reliability—especially for low-confidence predictions. To address this, we propose a two-stage data-driven framework: (1) during training, hard instances are filtered using instance hardness to improve model robustness; (2) during inference, unreliable predictions are rejected based on a calibrated confidence threshold. Our approach innovatively integrates difficulty-aware learning with a confidence-driven rejection mechanism—distinct from conventional uncertainty estimation or influence analysis methods. Experiments on three real-world clinical datasets demonstrate that our method achieves superior trade-offs between predictive accuracy and rejection rate: it preserves the majority of informative samples while significantly reducing misclassification risk. Crucially, it maintains high accuracy without sacrificing robustness or trustworthiness, thereby exhibiting strong potential for clinical deployment.

Technology Category

Application Category

📝 Abstract
Machine Learning (ML) models are widely used in high-stakes domains such as healthcare, where the reliability of predictions is critical. However, these models often fail to account for uncertainty, providing predictions even with low confidence. This work proposes a novel two-step data-centric approach to enhance the performance of ML models by improving data quality and filtering low-confidence predictions. The first step involves leveraging Instance Hardness (IH) to filter problematic instances during training, thereby refining the dataset. The second step introduces a confidence-based rejection mechanism during inference, ensuring that only reliable predictions are retained. We evaluate our approach using three real-world healthcare datasets, demonstrating its effectiveness at improving model reliability while balancing predictive performance and rejection rate. Additionally, we use alternative criteria - influence values for filtering and uncertainty for rejection - as baselines to evaluate the efficiency of the proposed method. The results demonstrate that integrating IH filtering with confidence-based rejection effectively enhances model performance while preserving a large proportion of instances. This approach provides a practical method for deploying ML systems in safety-critical applications.
Problem

Research questions and friction points this paper is trying to address.

Enhancing ML model reliability in healthcare through data quality improvement
Filtering problematic training instances using instance hardness analysis
Implementing confidence-based rejection for low-quality predictions during inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Filtering instances using Instance Hardness during training
Rejecting low-confidence predictions during inference
Enhancing model reliability with data-centric approach
🔎 Similar Papers
No similar papers found.
M
Maria Gabriela Valeriano
Instituto de Computação, Universidade Estadual de Campinas, Av. Albert Einstein, Campinas, 13083-889, São Paulo, Brazil.
David Kohan Marzagão
David Kohan Marzagão
King's College London
A
Alfredo Montelongo
Núcleo de Telessaúde, Universidade Federal do Rio Grande do Sul, Rua Dona Laura, Porto Alegre, 10587, Rio Grande do Sul, Brazil.
C
Carlos Roberto Veiga Kiffer
Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Botucatu, São Paulo, 04044-020 , São Paulo, Brazil.
N
Natan Katz
Núcleo de Telessaúde, Universidade Federal do Rio Grande do Sul, Rua Dona Laura, Porto Alegre, 10587, Rio Grande do Sul, Brazil.
Ana Carolina Lorena
Ana Carolina Lorena
Instituto Tecnológico de Aeronáutica
Aprendizado de MáquinaMineração de DadosMachine LearningData MiningData Science