Data Presentation Over Architecture: Resampling Strategies for Credit Risk Prediction with Tabular Foundation Models

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study addresses the challenges of severe class imbalance, heterogeneous features, and low-latency requirements in credit default prediction by systematically evaluating the impact of context construction strategies on Tabular Foundation Models (TFMs). Through large-scale experiments on the Home Credit and Lending Club datasets—combining seven context construction strategies (including balanced and hybrid sampling), five TFM architectures, and four classical baselines—it reveals, for the first time, that in highly imbalanced settings, the choice of context construction strategy exerts a far greater influence on predictive performance than model architecture. Results demonstrate that balanced sampling improves AUC-ROC by 3–4 percentage points, and TFMs trained on merely 5K–10K samples match or exceed the performance of classical models trained on full datasets, while substantially enhancing recall for the minority default class.

📝 Abstract

Credit default prediction is a tabular learning problem with severe class imbalance, heterogeneous features, and tight latency budgets. Tabular Foundation Models (TFMs) approach this problem through in-context learning, which makes their predictions sensitive to how the context window is built. We benchmark four classical models and five TFMs on the Home Credit and Lending Club datasets, varying the context-construction strategy (seven options) and the context size (1K to 50K). On both datasets, the choice of context strategy explains more variance in AUC-ROC than the choice of TFM family: balanced and hybrid sampling add 3 to 4 AUC points over uniform sampling, and the gap exceeds the spread between TFMs. With a balanced context of 5K to 10K examples, the strongest TFMs reach the AUC of classical baselines trained on the full data, while also recovering meaningful default-class recall that default-threshold GBDTs do not. We frame this as evidence that context construction, rather than architecture choice, is the primary deployment lever for TFMs in imbalanced credit-risk settings.

Problem

Research questions and friction points this paper is trying to address.

credit risk prediction

class imbalance

tabular foundation models

context construction

default prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tabular Foundation Models

context construction

resampling strategies