CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited ability of existing symbolic approaches to effectively capture the holistic visual patterns of tables. To overcome this challenge, the authors propose a coarse-to-fine two-stage reasoning framework: first, leveraging multimodal large language models (MLLMs) to generate knowledge tuples from multiple visual perspectives, thereby enabling coarse-grained table understanding; then guiding a symbolic reasoning engine to perform fine-grained, iterative table operations. The approach innovatively decouples visual perception from symbolic reasoning in a hierarchical manner and constructs a dynamic reasoning map, significantly enhancing adaptability to both large-scale tables and smaller backbone models. Competitive accuracy is achieved on the WikiTQ and TabFact benchmarks, with particularly robust performance on complex and large-scale tabular data.

Technology Category

Application Category

📝 Abstract

Reasoning over tabular data is a crucial capability for tasks like question answering and fact verification, as it requires models to comprehend both free-form questions and semi-structured tables. However, while methods like Chain-of-Thought (CoT) introduce reasoning chains, purely symbolic methodes are inherently limited by their blindness to holistic visual patterns. To address this, we propose the Coarse-to-Fine Multimodal Synthesis framework (CFMS), a novel two-stage paradigm that hierarchically decouples high-level visual perception from granular symbolic reasoning. In the Coarse Stage, CFMS leverages the Multimodal Large Language Models (MLLMs) to perform a one-time synthesis of a multi-perspective knowledge tuple. This tuple subsequently serves as a dynamic reasoning map to guide the fine stage, where a symbolic engine executes a targeted and efficient sequence of iterative operations over the table. Extensive experiments on the WikiTQ and TabFact benchmarks demonstrate that CFMS achieves competitive accuracy. The framework exhibits particular robustness when handling large tables and when instantiated with smaller backbone models, validating its effectiveness and generalizability.

Problem

Research questions and friction points this paper is trying to address.

tabular reasoning

multimodal synthesis

visual patterns

symbolic reasoning

question answering

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coarse-to-Fine

Multimodal Synthesis

Tabular Reasoning

Multimodal Large Language Models

Symbolic Reasoning

🔎 Similar Papers

No similar papers found.

Authors to Follow