MLCBART: Multilabel Classification with Bayesian Additive Regression Trees

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the challenge of multi-label classification, where complex inter-label dependencies and highly nonlinear relationships with features hinder effective modeling by existing methods. The authors propose the first extension of Bayesian Additive Regression Trees (BART) to multi-label classification by introducing latent continuous variables that are thresholded to produce discrete labels, while explicitly capturing label correlations through a multivariate normal distribution. This unified framework jointly models nonlinear effects, label dependencies, and predictive uncertainty, with inference performed via Markov chain Monte Carlo (MCMC). Experimental results demonstrate that the proposed model significantly outperforms current approaches on simulated data, achieving prediction accuracy nearly matching that of an oracle model, and provides well-calibrated conditional probabilities for label combinations along with reliable uncertainty quantification.

Technology Category

Application Category

📝 Abstract

Multilabel Classification (MLC) deals with the simultaneous classification of multiple binary labels. The task is challenging because, not only may there be arbitrarily different and complex relationships between predictor variables and each label, but associations among labels may exist even after accounting for effects of predictor variables. In this paper, we present a Bayesian additive regression tree (BART) framework to model the problem. BART is a nonparametric and flexible model structure capable of uncovering complex relationships within the data. Our adaptation, MLCBART, assumes that labels arise from thresholding an underlying numeric scale, where a multivariate normal model allows explicit estimation of the correlation structure among labels. This enables the discovery of complicated relationships in various forms and improves MLC predictive performance. Our Bayesian framework not only enables uncertainty quantification for each predicted label, but our MCMC draws produce an estimated conditional probability distribution of label combinations for any predictor values. Simulation experiments demonstrate the effectiveness of the proposed model by comparing its performance with a set of models, including the oracle model with the correct functional form. Results show that our model predicts vectors of labels more accurately than other contenders and its performance is close to the oracle model. An example highlights how the method's ability to produce measures of uncertainty on predictions provides nuanced understanding of classification results.

Problem

Research questions and friction points this paper is trying to address.

Multilabel Classification

Label Correlation

Bayesian Additive Regression Trees

Uncertainty Quantification

MCMC

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilabel Classification

Bayesian Additive Regression Trees

Label Correlation