JoLT: Joint Probabilistic Predictions on Tabular Data Using LLMs

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Addressing challenges in joint probabilistic modeling of tabular data—including multi-target prediction, heterogeneous variable types (classification/regression), and robustness to missing values—this paper proposes JoLT, the first end-to-end joint probability modeling framework leveraging large language model (LLM) in-context learning. JoLT requires no model training, data preprocessing, or explicit imputation; instead, it achieves direct joint distribution estimation via structured-to-textual data encoding and probabilistic prompting. Its key contributions are: (1) the first native application of LLMs to tabular joint probability modeling; (2) zero-shot automatic handling of missing values and seamless integration of auxiliary textual side information; and (3) unified support for both single- and multi-target settings across classification and regression tasks. Experiments demonstrate that JoLT significantly outperforms state-of-the-art methods in low-data regimes and enables high-fidelity synthetic data reconstruction.

Technology Category

Application Category

📝 Abstract

We introduce a simple method for probabilistic predictions on tabular data based on Large Language Models (LLMs) called JoLT (Joint LLM Process for Tabular data). JoLT uses the in-context learning capabilities of LLMs to define joint distributions over tabular data conditioned on user-specified side information about the problem, exploiting the vast repository of latent problem-relevant knowledge encoded in LLMs. JoLT defines joint distributions for multiple target variables with potentially heterogeneous data types without any data conversion, data preprocessing, special handling of missing data, or model training, making it accessible and efficient for practitioners. Our experiments show that JoLT outperforms competitive methods on low-shot single-target and multi-target tabular classification and regression tasks. Furthermore, we show that JoLT can automatically handle missing data and perform data imputation by leveraging textual side information. We argue that due to its simplicity and generality, JoLT is an effective approach for a wide variety of real prediction problems.

Problem

Research questions and friction points this paper is trying to address.

Probabilistic predictions on tabular data

Handling missing data automatically

Outperforming competitive methods in classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based joint probabilistic predictions

Handles heterogeneous data without preprocessing

Automatic missing data imputation

🔎 Similar Papers

Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science