Make Still Further Progress: Chain of Thoughts for Tabular Data Leaderboard

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Tabular model performance degrades significantly under distributional shifts, and existing ensemble methods rely on static weighting schemes lacking instance-level adaptability. To address this, we propose Context-Aware Tabular Ensemble (CATE): a novel framework that constructs instance-level contextual neighborhoods via k-nearest neighbors—without accessing raw features—and fuses predictions from multiple base models. Crucially, CATE leverages large language models (LLMs) to perform Chain of Tabular Thoughts (CoT²) prompting, enabling dynamic, interpretable, and reasoning-driven ensemble prediction. This work pioneers the integration of multi-step, chain-of-thought reasoning into tabular ensemble learning, overcoming the rigidity of conventional zero-shot ensembles. Extensive experiments across standard tabular benchmarks demonstrate that CATE consistently outperforms fine-tuned single models and state-of-the-art ensemble baselines, achieving superior robustness and expert-level generalization.

Technology Category

Application Category

📝 Abstract
Tabular data, a fundamental data format in machine learning, is predominantly utilized in competitions and real-world applications. The performance of tabular models--such as gradient boosted decision trees and neural networks--can vary significantly across datasets due to differences in feature distributions and task characteristics. Achieving top performance on each dataset often requires specialized expert knowledge. To address this variability, practitioners often aggregate the predictions of multiple models. However, conventional aggregation strategies typically rely on static combination rules and lack instance-level adaptability. In this work, we propose an in-context ensemble framework for tabular prediction that leverages large language models (LLMs) to perform dynamic, instance-specific integration of external model predictions. Without access to raw tabular features or semantic information, our method constructs a context around each test instance using its nearest neighbors and the predictions from a pool of external models. Within this enriched context, we introduce Chain of Tabular Thoughts (CoT$^2$), a prompting strategy that guides LLMs through multi-step, interpretable reasoning, making still further progress toward expert-level decision-making. Experimental results show that our method outperforms well-tuned baselines and standard ensemble techniques across a wide range of tabular datasets.
Problem

Research questions and friction points this paper is trying to address.

Dynamic ensemble framework for tabular data variability
Instance-specific model prediction integration using LLMs
Improving performance without raw features or semantic info
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based dynamic ensemble for tabular predictions
Context construction using nearest neighbors
Chain of Tabular Thoughts prompting strategy
🔎 Similar Papers
No similar papers found.
Si-Yang Liu
Si-Yang Liu
Nanjing University
Machine LearningTabular DataLLMs
Q
Qi-Le Zhou
School of Artificial Intelligence, Nanjing University, China; National Key Laboratory for Novel Software Technology, Nanjing University
Han-Jia Ye
Han-Jia Ye
Nanjing University
Machine LearningData MiningMetric LearningMeta-Learning