🤖 AI Summary
Tabular model performance degrades significantly under distributional shifts, and existing ensemble methods rely on static weighting schemes lacking instance-level adaptability. To address this, we propose Context-Aware Tabular Ensemble (CATE): a novel framework that constructs instance-level contextual neighborhoods via k-nearest neighbors—without accessing raw features—and fuses predictions from multiple base models. Crucially, CATE leverages large language models (LLMs) to perform Chain of Tabular Thoughts (CoT²) prompting, enabling dynamic, interpretable, and reasoning-driven ensemble prediction. This work pioneers the integration of multi-step, chain-of-thought reasoning into tabular ensemble learning, overcoming the rigidity of conventional zero-shot ensembles. Extensive experiments across standard tabular benchmarks demonstrate that CATE consistently outperforms fine-tuned single models and state-of-the-art ensemble baselines, achieving superior robustness and expert-level generalization.
📝 Abstract
Tabular data, a fundamental data format in machine learning, is predominantly utilized in competitions and real-world applications. The performance of tabular models--such as gradient boosted decision trees and neural networks--can vary significantly across datasets due to differences in feature distributions and task characteristics. Achieving top performance on each dataset often requires specialized expert knowledge. To address this variability, practitioners often aggregate the predictions of multiple models. However, conventional aggregation strategies typically rely on static combination rules and lack instance-level adaptability. In this work, we propose an in-context ensemble framework for tabular prediction that leverages large language models (LLMs) to perform dynamic, instance-specific integration of external model predictions. Without access to raw tabular features or semantic information, our method constructs a context around each test instance using its nearest neighbors and the predictions from a pool of external models. Within this enriched context, we introduce Chain of Tabular Thoughts (CoT$^2$), a prompting strategy that guides LLMs through multi-step, interpretable reasoning, making still further progress toward expert-level decision-making. Experimental results show that our method outperforms well-tuned baselines and standard ensemble techniques across a wide range of tabular datasets.