Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models

📅 2025-01-24

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing table-oriented instruction-tuning studies suffer from inconsistent training data, hindering rigorous isolation of architectural versus data-quality effects on model performance. Method: This work presents the first systematic disentanglement of these two factors, applying a unified instruction-tuning pipeline and standardized evaluation protocol across Mistral, OLMo, and Phi series models—enabling fair, cross-model and cross-dataset comparisons. Our methodology integrates instruction tuning, multi-benchmark reproduction (including Hitab for table QA), cross-domain generalization testing, and joint evaluation on both table-specific and general-purpose NLP benchmarks. Contributions/Results: We empirically uncover an inherent trade-off between table specialization and general language capability; achieve new state-of-the-art performance on Hitab; match or exceed prior table-LLMs in reproduced evaluations; and—crucially—provide the first quantitative decomposition of performance gains attributable separately to model architecture and training data quality.

Technology Category

Application Category

📝 Abstract

Recent advances in natural language processing have leveraged instruction tuning to enhance Large Language Models (LLMs) for table-related tasks. However, previous works train different base models with different training data, lacking an apples-to-apples comparison across the result table LLMs. To address this, we fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs, establishing new state-of-the-art performance on Hitab, a table question-answering dataset. More importantly, through systematic out-of-domain evaluation, we decouple the contributions of training data and the base model, providing insight into their individual impacts. In addition, we assess the effects of table-specific instruction tuning on general-purpose benchmarks, revealing trade-offs between specialization and generalization.

Problem

Research questions and friction points this paper is trying to address.

Table Instruction Optimization

Model Performance Comparison

Data Influence Analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Table Instruction Optimization

Cross-Task Performance Balance

Standardized Training and Testing

🔎 Similar Papers

Under the Hood of Tabular Data Generation Models: Benchmarks with Extensive Tuning