🤖 AI Summary
This work addresses the high computational cost and slow training speed of sparse oblique random forests, which stem from the need to sort or construct histograms over linear combinations of features. To overcome this limitation, the authors propose an adaptive histogram selection mechanism that dynamically switches between sorting and histogram-based strategies to efficiently identify optimal splits, further accelerating histogram construction through vectorized instructions. A CPU-GPU hybrid implementation is also introduced to enable GPU acceleration. This approach represents the first integration of adaptive histogram selection and vectorization optimizations into oblique random forests. Experimental results demonstrate that the proposed method achieves 1.7–2.5× faster training than existing oblique forest algorithms on large-scale datasets and outperforms standard random forests by 1.5–2× in training speed.
📝 Abstract
Classification using sparse oblique random forests provides guarantees on uncertainty and confidence while controlling for specific error types. However, they use more data and more compute than other tree ensembles because they create deep trees and need to sort or histogram linear combinations of data at runtime. We provide a method for dynamically switching between histograms and sorting to find the best split. We further optimize histogram construction using vector intrinsics. Evaluating this on large datasets, our optimizations speedup training by 1.7-2.5x compared to existing oblique forests and 1.5-2x compared to standard random forests. We also provide a GPU and hybrid CPU-GPU implementation.