🤖 AI Summary
Real-world tables often suffer from irregular structures, heterogeneous value formats, and implicit relationships, which undermine the reliability of downstream reasoning and question answering. This work proposes the first query-agnostic table preprocessing framework that losslessly transforms raw tables into a unified, SQL-ready canonical representation prior to observing any query. By decoupling data cleaning from reasoning through schema normalization, value standardization, explicit encoding of implicit relationships, and preservation of the original snapshot, the approach significantly enhances model generalization and robustness. Consistent performance gains are demonstrated across four benchmarks—WikiTQ, HiTab, NQ-Table, and SequentialQA—with particularly strong results on challenging subsets featuring structural diversity and unseen questions.
📝 Abstract
Real-world tables often exhibit irregular schemas, heterogeneous value formats, and implicit relational structure, which degrade the reliability of downstream table reasoning and question answering. Most existing approaches address these issues in a query-dependent manner, entangling table cleanup with reasoning and thus limiting generalization. We introduce QuIeTT, a query-independent table transformation framework that preprocesses raw tables into a single SQL-ready canonical representation before any test-time queries are observed. QuIeTT performs lossless schema and value normalization, exposes implicit relations, and preserves full provenance via raw table snapshots. By decoupling table transformation from reasoning, QuIeTT enables cleaner, more reliable, and highly efficient querying without modifying downstream models. Experiments on four benchmarks, WikiTQ, HiTab, NQ-Table, and SequentialQA show consistent gains across models and reasoning paradigms, with particularly strong improvements on a challenge set of structurally diverse, unseen questions.