🤖 AI Summary
This work addresses structural oscillation and inconsistent analytical conclusions arising from frequent model retraining in continual learning. To this end, we propose a model-agnostic stable sequential modeling paradigm. Methodologically, we formulate a Pareto-optimal mixed-integer optimization framework that jointly optimizes predictive performance and cross-iteration structural stability. We introduce an interpretability-driven, task-adaptive distance metric—compatible with both tree-based models and neural networks—and design a polynomial-time approximation algorithm coupled with a stability regularization mechanism. Extensive evaluation across medical, vision, and language domains—including real-world clinical deployment in hospital production environments—demonstrates that our approach incurs only a 2% average accuracy degradation while improving structural stability by 30%. To the best of our knowledge, this is the first stability-preserving retraining framework validated in clinical practice.
📝 Abstract
We consider the problem of retraining machine learning (ML) models when new batches of data become available. Existing approaches greedily optimize for predictive power independently at each batch, without considering the stability of the model's structure or analytical insights across retraining iterations. We propose a model-agnostic framework for finding sequences of models that are stable across retraining iterations. We develop a mixed-integer optimization formulation that is guaranteed to recover Pareto optimal models (in terms of the predictive power-stability trade-off) with good generalization properties, as well as an efficient polynomial-time algorithm that performs well in practice. We focus on retaining consistent analytical insights-which is important to model interpretability, ease of implementation, and fostering trust with users-by using custom-defined distance metrics that can be directly incorporated into the optimization problem. We evaluate our framework across models (regression, decision trees, boosted trees, and neural networks) and application domains (healthcare, vision, and language), including deployment in a production pipeline at a major US hospital. We find that, on average, a 2% reduction in predictive power leads to a 30% improvement in stability.