🤖 AI Summary
This work addresses the tension between the high computational cost of high-performance deep networks and the limited expressive power of traditional methods by proposing a parameter-free, interpretable, and plug-and-play heterogeneous model ensemble framework. The approach integrates diverse base learners—including linear models, tree ensembles, kernel methods, and neural networks—and learns optimal non-negative combination weights via cross-validation coupled with non-negative least squares, offering theoretical guarantees that performance never degrades relative to the best individual expert. Evaluated across 37 benchmark datasets, the method achieves top-ranked performance in 70% of tasks (with the best average rank, p = 1.12 × 10⁻¹²), emerging as the only approach significantly outperforming others in both classification and regression. Moreover, it trains 72–435 times faster than deep networks.
📝 Abstract
Modern machine learning forces practitioners to choose between powerful but expensive deep networks and fast but limited classical algorithms. Here we introduce Soft Learning, a framework that maintains a library of heterogeneous specialists -- spanning linear models, tree ensembles, kernel machines, and neural networks -- and discovers provably optimal combination weights through cross-validated non-negative least squares. Soft Learning is guaranteed to match or exceed the best weighted combination of its specialists, trains over two orders of magnitude faster than deep networks on CPU alone (72-435x faster across tested configurations), provides inherent interpretability through learned weights that reveal which algorithmic paradigm best fits the data, and is future-proof: adding specialists is mathematically guaranteed to maintain or improve performance. Across 37 datasets (25 classification, 12 regression) against nine methods including CatBoost and tuned deep networks, Soft Learning ranks first on 70% of tasks, achieves the best mean rank (Friedman test, p = 1.12 x 10^-12), and is the only method to simultaneously excel at both classification and regression -- all without GPU hardware or hyperparameter tuning. These results suggest a paradigm shift from "which algorithm is best?" to "what is the provably optimal combination?" -- a question Soft Learning answers with formal guarantees for any data modality.