Table Understanding and (Multimodal) LLMs: A Cross-Domain Case Study on Scientific vs. Non-Scientific Data

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Current large language models (LLMs) exhibit uncharacterized cross-modal robustness in tabular understanding—particularly across scientific versus non-scientific domains—and suffer from poorly understood bottlenecks in interpreting scientifically complex tables (e.g., those with nested structures, units, and mathematical notation). Method: We introduce TableEval, the first large-scale, multi-domain, multi-format tabular benchmark comprising 3,017 tables from academic, Wikipedia, and financial sources, each represented in five modalities (e.g., Markdown, HTML, rendered images), enabling fine-grained cross-domain and cross-modal evaluation. We complement quantitative assessment with interpretability analysis to characterize model attention biases toward context and structural cues. Contribution/Results: Experiments reveal that state-of-the-art LLMs perform robustly on non-scientific tables but degrade substantially on scientific ones—especially those involving symbolic reasoning and hierarchical semantics—exposing fundamental limitations in domain-specific knowledge integration and formal symbol manipulation.

Technology Category

Application Category

📝 Abstract

Tables are among the most widely used tools for representing structured data in research, business, medicine, and education. Although LLMs demonstrate strong performance in downstream tasks, their efficiency in processing tabular data remains underexplored. In this paper, we investigate the effectiveness of both text-based and multimodal LLMs on table understanding tasks through a cross-domain and cross-modality evaluation. Specifically, we compare their performance on tables from scientific vs. non-scientific contexts and examine their robustness on tables represented as images vs. text. Additionally, we conduct an interpretability analysis to measure context usage and input relevance. We also introduce the TableEval benchmark, comprising 3017 tables from scholarly publications, Wikipedia, and financial reports, where each table is provided in five different formats: Image, Dictionary, HTML, XML, and LaTeX. Our findings indicate that while LLMs maintain robustness across table modalities, they face significant challenges when processing scientific tables.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' table understanding across scientific and non-scientific domains

Assessing robustness of text-based and multimodal LLMs on tabular data

Measuring context usage and input relevance in table interpretation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates text-based and multimodal LLMs on tables

Introduces TableEval benchmark with diverse table formats

Analyzes LLM robustness across scientific and non-scientific tables

🔎 Similar Papers

TableBench: A Comprehensive and Complex Benchmark for Table Question Answering