Vectorized Adaptive Histograms for Sparse Oblique Forests

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and slow training speed of sparse oblique random forests, which stem from the need to sort or construct histograms over linear combinations of features. To overcome this limitation, the authors propose an adaptive histogram selection mechanism that dynamically switches between sorting and histogram-based strategies to efficiently identify optimal splits, further accelerating histogram construction through vectorized instructions. A CPU-GPU hybrid implementation is also introduced to enable GPU acceleration. This approach represents the first integration of adaptive histogram selection and vectorization optimizations into oblique random forests. Experimental results demonstrate that the proposed method achieves 1.7–2.5× faster training than existing oblique forest algorithms on large-scale datasets and outperforms standard random forests by 1.5–2× in training speed.

Technology Category

Application Category

📝 Abstract
Classification using sparse oblique random forests provides guarantees on uncertainty and confidence while controlling for specific error types. However, they use more data and more compute than other tree ensembles because they create deep trees and need to sort or histogram linear combinations of data at runtime. We provide a method for dynamically switching between histograms and sorting to find the best split. We further optimize histogram construction using vector intrinsics. Evaluating this on large datasets, our optimizations speedup training by 1.7-2.5x compared to existing oblique forests and 1.5-2x compared to standard random forests. We also provide a GPU and hybrid CPU-GPU implementation.
Problem

Research questions and friction points this paper is trying to address.

sparse oblique random forests
histogram construction
computational efficiency
tree ensembles
runtime optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse oblique random forests
vectorized histograms
dynamic split selection
CPU-GPU hybrid implementation
vector intrinsics