TabReX : Tabular Referenceless eXplainable Evaluation

📅 2025-12-17

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing LLM-based table generation evaluation methods either neglect structural constraints or rely on fixed reference tables, resulting in poor generalizability. This paper introduces TabReX—the first reference-free, graph-structure-driven evaluation framework for table generation. TabReX unifies textual and tabular outputs as knowledge graphs and employs LLM-guided alignment to yield interpretable, quantitative scores measuring both structural fidelity and factual accuracy. Key contributions include: (1) attribute-level graph reasoning modeling; (2) customizable rubrics; (3) controllable sensitivity and specificity; (4) cell-level error tracing; and (5) fine-grained model–prompt analysis. Evaluated on TabReX-Bench—a novel, multidimensional perturbation benchmark covering six domains and twelve perturbation types—TabReX significantly outperforms existing metrics: it achieves the highest correlation with expert rankings and demonstrates superior robustness under strong perturbations. TabReX establishes a new, interpretable paradigm for trustworthy table generation evaluation.

Technology Category

Application Category

📝 Abstract

Evaluating the quality of tables generated by large language models (LLMs) remains an open challenge: existing metrics either flatten tables into text, ignoring structure, or rely on fixed references that limit generalization. We present TabReX, a reference-less, property-driven framework for evaluating tabular generation via graph-based reasoning. TabReX converts both source text and generated tables into canonical knowledge graphs, aligns them through an LLM-guided matching process, and computes interpretable, rubric-aware scores that quantify structural and factual fidelity. The resulting metric provides controllable trade-offs between sensitivity and specificity, yielding human-aligned judgments and cell-level error traces. To systematically asses metric robustness, we introduce TabReX-Bench, a large-scale benchmark spanning six domains and twelve planner-driven perturbation types across three difficulty tiers. Empirical results show that TabReX achieves the highest correlation with expert rankings, remains stable under harder perturbations, and enables fine-grained model-vs-prompt analysis establishing a new paradigm for trustworthy, explainable evaluation of structured generation systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluates table quality from LLMs without references

Uses graph-based reasoning for structural and factual fidelity

Introduces benchmark for robust, explainable evaluation metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based reasoning converts tables to knowledge graphs

LLM-guided matching aligns source text with generated tables

Rubric-aware scores quantify structural and factual fidelity

🔎 Similar Papers

No similar papers found.