Harnessing LLMs Explanations to Boost Surrogate Models in Tabular Data Classification

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based table classification methods suffer from high computational overhead, suboptimal in-context example selection, and insufficient interpretability. To address these limitations, we propose the first explanation-driven, three-stage *post-hoc* in-context learning framework: (1) an LLM generates structured reasoning paths for input tables; (2) semantically similar, highly relevant in-context examples are retrieved via embedding-based similarity matching; and (3) an explanation-guided lightweight surrogate language model (SLM) performs interpretable classification via fine-tuning or prompt optimization. Our paradigm integrates post-hoc explanation generation, semantic-aware retrieval, and SLM adaptation—eliminating the need for end-to-end LLM inference. Evaluated across diverse domain-specific table datasets, our approach achieves a 5.31% average accuracy improvement over strong baselines, while significantly enhancing inference efficiency and decision transparency. This work establishes a novel, resource-efficient paradigm for trustworthy table classification in constrained deployment scenarios.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have shown remarkable ability in solving complex tasks, making them a promising tool for enhancing tabular learning. However, existing LLM-based methods suffer from high resource requirements, suboptimal demonstration selection, and limited interpretability, which largely hinder their prediction performance and application in the real world. To overcome these problems, we propose a novel in-context learning framework for tabular prediction. The core idea is to leverage the explanations generated by LLMs to guide a smaller, locally deployable Surrogate Language Model (SLM) to make interpretable tabular predictions. Specifically, our framework mainly involves three stages: (i) Post Hoc Explanation Generation, where LLMs are utilized to generate explanations for question-answer pairs in candidate demonstrations, providing insights into the reasoning behind the answer. (ii) Post Hoc Explanation-Guided Demonstrations Selection, which utilizes explanations generated by LLMs to guide the process of demonstration selection from candidate demonstrations. (iii) Post Hoc Explanation-Guided Interpretable SLM Prediction, which utilizes the demonstrations obtained in step (ii) as in-context and merges corresponding explanations as rationales to improve the performance of SLM and guide the model to generate interpretable outputs. Experimental results highlight the framework's effectiveness, with an average accuracy improvement of 5.31% across various tabular datasets in diverse domains.
Problem

Research questions and friction points this paper is trying to address.

Overcoming high resource demands in LLM-based tabular learning
Improving suboptimal demonstration selection for tabular prediction
Enhancing interpretability of surrogate models in tabular data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage LLM explanations for surrogate model guidance
Select demonstrations using LLM-generated explanations
Merge explanations as rationales for interpretable SLM outputs
🔎 Similar Papers
No similar papers found.
Ruxue Shi
Ruxue Shi
Jilin University Grad Student
Tabular LearningData Mining
Hengrui Gu
Hengrui Gu
North Carolina State University
Knowledge Maintenance
X
Xu Shen
Jilin University, Changchun, China
X
Xin Wang
Jilin University, Changchun, China