Generalizing Numerical Reasoning in Table Data through Operation Sketches and Self-Supervised Learning

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing models for numerical reasoning over specialized tables often rely on superficial associations between table headers and operations, resulting in poor cross-domain generalization. This work proposes the TaNOS framework, which decouples domain-specific semantics from numerical operation structures through header anonymization, operation sketch guidance, and program-prioritized self-supervised pretraining. TaNOS is the first approach to jointly integrate operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision. Remarkably, it achieves an execution accuracy of 80.13% using only 10% of the FinQA training data—surpassing fully supervised fine-tuned baselines (73.97%) and outperforming closed-source models such as GPT-5 and Gemini-2.5-Pro—while reducing the cross-domain performance gap to within two percentage points.

Technology Category

Application Category

📝 Abstract

Numerical reasoning over expert-domain tables often exhibits high in-domain accuracy but limited robustness to domain shift. Models trained with supervised fine-tuning (SFT) on specific datasets tend to rely on header-operation shortcuts rather than structural reasoning. We introduce TaNOS, a continual pre-training framework comprising three components: (i) header anonymization to reduce lexical memorization, (ii) operation sketches that provide minimal structural cues, and (iii) self-supervised pretraining that constructs correctness-guaranteed program-question pairs from given tables in a program-first manner. By decoupling domain semantics and numerical operation structure, TaNOS improves the transferability of numerical reasoning. Applied to an 8B instruction-tuned model, TaNOS achieves 80.13% execution accuracy on FinQA with only 10% train data, outperforming SFT baseline (73.97%) with full train data and proprietary models such as GPT-5, Gemini-2.5-Pro. Furthermore, in the domain-shift experiments, TaNOS displays nearly-negligible cross-domain gap (<2pp) when standard SFT shows over 10pp gap. These results suggest that structural guidance with operation sketches, header-agnostic representations, and correctness-guaranteed self-supervision can improve the robustness of numerical reasoning across diverse expert-domain tables.

Problem

Research questions and friction points this paper is trying to address.

numerical reasoning

domain shift

table data

structural reasoning

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

operation sketches

self-supervised learning

header anonymization