xRFM: Accurate, scalable, and interpretable feature learning models for tabular data

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Addressing the longstanding challenge of jointly achieving high accuracy, scalability, and interpretability in tabular data prediction, this paper proposes xRFM—a novel model that synergistically integrates a feature-learning kernel mechanism with a tree-based architecture. This design preserves strong local adaptivity while enabling efficient training on ultra-large-scale datasets. Crucially, xRFM is natively interpretable: we introduce the Average Gradient Outer Product (AGOP) as a principled, differentiable metric for quantifying feature importance—overcoming the inherent accuracy–interpretability trade-off in conventional GBDTs and neural approaches. Leveraging a scalable training framework, xRFM achieves systematic performance gains over 31 state-of-the-art baselines—including TabPFNv2 and leading GBDT variants—across 100 regression and 200 classification benchmark datasets. It attains SOTA results on the majority of tasks, while delivering reliable, reproducible, and human-understandable explanations.

Technology Category

Application Category

📝 Abstract

Inference from tabular data, collections of continuous and categorical variables organized into matrices, is a foundation for modern technology and science. Yet, in contrast to the explosive changes in the rest of AI, the best practice for these predictive tasks has been relatively unchanged and is still primarily based on variations of Gradient Boosted Decision Trees (GBDTs). Very recently, there has been renewed interest in developing state-of-the-art methods for tabular data based on recent developments in neural networks and feature learning methods. In this work, we introduce xRFM, an algorithm that combines feature learning kernel machines with a tree structure to both adapt to the local structure of the data and scale to essentially unlimited amounts of training data. We show that compared to $31$ other methods, including recently introduced tabular foundation models (TabPFNv2) and GBDTs, xRFM achieves best performance across $100$ regression datasets and is competitive to the best methods across $200$ classification datasets outperforming GBDTs. Additionally, xRFM provides interpretability natively through the Average Gradient Outer Product.

Problem

Research questions and friction points this paper is trying to address.

Improving feature learning for tabular data accuracy and scalability

Enhancing interpretability in tabular data models

Competing with GBDTs and new tabular foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines feature learning with kernel machines

Integrates tree structure for local data adaptation

Scales to unlimited training data efficiently

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Machine Learning Engineer