🤖 AI Summary
Scientific tables—highly technical and information-dense—are frequently misinterpreted, and existing models (including LLMs) suffer from inaccurate fine-grained claim verification and excessive cognitive load. To address this, we propose a modular reasoning framework grounded in reusable atomic skills: guided by cognitive load theory, we decompose verification into interpretable sub-tasks and introduce SciAtomicBench—the first cross-domain, fine-grained benchmark for scientific table claim verification. We design a dynamic skill chaining mechanism to align claims with evidence and enable structured table understanding, and achieve effective lightweight fine-tuning with only 350 samples. Our method achieves SOTA performance on SciAtomicBench, significantly outperforming GPT-4o’s chain-of-thought in verification accuracy, while exhibiting strong generalization and reducing training data requirements by an order of magnitude. The core innovations are atomic skill modeling and dynamic skill orchestration, enabling low-resource, high-accuracy, and cross-domain transferable scientific table claim verification.
📝 Abstract
Scientific texts often convey authority due to their technical language and complex data. However, this complexity can sometimes lead to the spread of misinformation. Non-experts are particularly susceptible to misleading claims based on scientific tables due to their high information density and perceived credibility. Existing table claim verification models, including state-of-the-art large language models (LLMs), often struggle with precise fine-grained reasoning, resulting in errors and a lack of precision in verifying scientific claims. Inspired by Cognitive Load Theory, we propose that enhancing a model's ability to interpret table-based claims involves reducing cognitive load by developing modular, reusable reasoning components (i.e., atomic skills). We introduce a skill-chaining schema that dynamically composes these skills to facilitate more accurate and generalizable reasoning with a reduced cognitive load. To evaluate this, we create SciAtomicBench, a cross-domain benchmark with fine-grained reasoning annotations. With only 350 fine-tuning examples, our model trained by atomic reasoning outperforms GPT-4o's chain-of-thought method, achieving state-of-the-art results with far less training data.