🤖 AI Summary
Schema linking in multi-table question answering (QA) is unreliable on real-world, complex tables due to ambiguous or heterogeneous schema structures. Method: This paper proposes a human-validated schema graph modeling approach: (1) constructing a structured schema graph grounded in domain knowledge; (2) performing natural language query-guided graph traversal to generate interpretable reasoning chains; and (3) applying subpath merging and pruning strategies to enhance cross-table reasoning efficiency and logical coherence. Contribution/Results: To our knowledge, this is the first work to successfully deploy human-guided schema graphs in industrial-scale multi-table QA, substantially reducing reliance on large language models (LLMs). Extensive experiments on standard benchmarks and a large-scale real-world industrial dataset demonstrate that our method consistently outperforms state-of-the-art approaches—achieving robust and effective performance on complex, heterogeneous tabular data with diverse column semantics.
📝 Abstract
Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods based on semantic similarity work well only on simplified hand-crafted datasets and struggle to handle complex, real-world scenarios with numerous and diverse columns. To address this, we propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths. Given a natural language query, our method searches this graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies to enhance efficiency and coherence. Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach. To our knowledge, this is the first multi-table QA system applied to truly complex industrial tabular data.