🤖 AI Summary
Scientific paper table-based claim verification suffers from an explainability bottleneck: existing methods yield only binary labels without localizing critical evidence cells or providing reasoning justifications. Method: We reformulate the task as a cell-level table-text alignment explanation problem and propose the first explainable verification paradigm for scientific tables. We introduce SciTab-X—the first dataset with human-annotated minimal rationale cells—and establish a taxonomy for ambiguous cases. Our approach integrates data augmentation, cell-level supervised training, and LLM-aligned evaluation. Results: Explicit cell-level alignment significantly improves verification accuracy. Moreover, experiments reveal that while mainstream LLMs often produce correct binary labels, their identified rationale cells diverge substantially from human annotations—exposing severe non-faithfulness in their reasoning processes. This highlights a critical gap between label accuracy and reasoning fidelity in current LLM-based table verification systems.
📝 Abstract
Scientific claim verification against tables typically requires predicting whether a claim is supported or refuted given a table. However, we argue that predicting the final label alone is insufficient: it reveals little about the model's reasoning and offers limited interpretability. To address this, we reframe table-text alignment as an explanation task, requiring models to identify the table cells essential for claim verification. We build a new dataset by extending the SciTab benchmark with human-annotated cell-level rationales. Annotators verify the claim label and highlight the minimal set of cells needed to support their decision. After the annotation process, we utilize the collected information and propose a taxonomy for handling ambiguous cases. Our experiments show that (i) incorporating table alignment information improves claim verification performance, and (ii) most LLMs, while often predicting correct labels, fail to recover human-aligned rationales, suggesting that their predictions do not stem from faithful reasoning.