CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited ability of existing symbolic approaches to effectively capture the holistic visual patterns of tables. To overcome this challenge, the authors propose a coarse-to-fine two-stage reasoning framework: first, leveraging multimodal large language models (MLLMs) to generate knowledge tuples from multiple visual perspectives, thereby enabling coarse-grained table understanding; then guiding a symbolic reasoning engine to perform fine-grained, iterative table operations. The approach innovatively decouples visual perception from symbolic reasoning in a hierarchical manner and constructs a dynamic reasoning map, significantly enhancing adaptability to both large-scale tables and smaller backbone models. Competitive accuracy is achieved on the WikiTQ and TabFact benchmarks, with particularly robust performance on complex and large-scale tabular data.

Technology Category

Application Category

📝 Abstract
Reasoning over tabular data is a crucial capability for tasks like question answering and fact verification, as it requires models to comprehend both free-form questions and semi-structured tables. However, while methods like Chain-of-Thought (CoT) introduce reasoning chains, purely symbolic methodes are inherently limited by their blindness to holistic visual patterns. To address this, we propose the Coarse-to-Fine Multimodal Synthesis framework (CFMS), a novel two-stage paradigm that hierarchically decouples high-level visual perception from granular symbolic reasoning. In the Coarse Stage, CFMS leverages the Multimodal Large Language Models (MLLMs) to perform a one-time synthesis of a multi-perspective knowledge tuple. This tuple subsequently serves as a dynamic reasoning map to guide the fine stage, where a symbolic engine executes a targeted and efficient sequence of iterative operations over the table. Extensive experiments on the WikiTQ and TabFact benchmarks demonstrate that CFMS achieves competitive accuracy. The framework exhibits particular robustness when handling large tables and when instantiated with smaller backbone models, validating its effectiveness and generalizability.
Problem

Research questions and friction points this paper is trying to address.

tabular reasoning
multimodal synthesis
visual patterns
symbolic reasoning
question answering
Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-to-Fine
Multimodal Synthesis
Tabular Reasoning
Multimodal Large Language Models
Symbolic Reasoning
🔎 Similar Papers
No similar papers found.