🤖 AI Summary
Gradient-boosted trees excel on tabular data but struggle to identify hard-to-predict instances. This work proposes the first prediction-trajectory-based instance difficulty score, Trajectory-based Difficulty Score (TDS), which analyzes cumulative prediction trajectories across trees—capturing variance, oscillation peaks, sign switches, and tail stability—to train a lightweight regression model that predicts hold-out loss. The resulting scores are calibrated via the empirical cumulative distribution function into [0,1]-bounded difficulty estimates. TDS provides a unified framework for active learning, selective prediction, and stratified conformal prediction, while SHAP-based clustering reveals interpretable failure modes. Experiments demonstrate that TDS exhibits strong rank correlation with prediction error across diverse tabular datasets, significantly outperforms existing difficulty and uncertainty baselines in classification tasks, remains competitive in regression, and effectively enhances label efficiency, risk-coverage trade-offs, and conditional coverage uniformity.
📝 Abstract
Gradient-boosted trees achieve strong performance on tabular data, yet often leave a long tail of poorly predicted instances. We introduce a Trajectory-based Difficulty Score (TDS), an instance-level difficulty estimator for boosted ensembles derived from per-tree cumulative prediction trajectories. For each instance, we compute interpretable trajectory descriptors (e.g., variance, oscillation peaks, sign switches, and tail stability) and train a lightweight regression model to predict held-out loss. An empirical CDF calibrates the resulting signal into a score in $[0,1]$ that supports ranking hard cases. Across diverse tabular benchmarks and ensemble sizes, TDS exhibits strong rank correlation with error and outperforms established instance-hardness and uncertainty baselines on classification, while remaining competitive on regression. We then show how a single difficulty signal improves multiple data mining workflows: difficulty-driven active learning for label-efficient training, difficulty-thresholded selective prediction for improved risk-coverage trade-offs, and TDS-stratified (Mondrian) conformal prediction for more uniform conditional coverage. Finally, clustering high-TDS instances using SHAP attributions reveals coherent failure modes characterized by compact feature-value ranges, supporting error analysis and targeted data acquisition.