MMFCTUB: Multi-Modal Financial Credit Table Understanding Benchmark

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the lack of high-quality multimodal benchmarks for financial credit table understanding, a field hindered by data inconsistency, high annotation costs, and misaligned evaluation metrics. To bridge this gap, we present the first multimodal benchmark comprising over 7,600 samples across five distinct table types. We introduce a weakly supervised construction approach that enforces constraint preservation and distributional consistency, alongside capability-driven question design and a mask-recovery strategy to evaluate models’ abilities in cross-table structural awareness, domain knowledge integration, and numerical reasoning. Comprehensive evaluations of leading multimodal large language models reveal their strengths and limitations in structural comprehension and logical inference, establishing a reliable benchmark and evaluation paradigm for future research.

Technology Category

Application Category

πŸ“ Abstract
The advent of multi-modal language models (MLLMs) has spurred research into their application across various table understanding tasks. However, their performance in credit table understanding (CTU) for financial credit review remains largely unexplored due to the following barriers: low data consistency, high annotation costs stemming from domain-specific knowledge and complex calculations, and evaluation paradigm gaps between benchmark and real-world scenarios. To address these challenges, we introduce MMFCTUB (Multi-Modal Financial Credit Table Understanding Benchmark), a practical benchmark, encompassing more than 7,600 high quality CTU samples across 5 table types. MMFCTUB employ a minimally supervised pipeline that adheres to inter-table constraints and maintains data distributions consistency. The benchmark leverages capacity-driven questions and mask-and-recovery strategy to evaluate models'cross-table structure perception, domain knowledge utilization, and numerical calculation capabilities. Utilizing MMFCTUB, we conduct comprehensive evaluations of both proprietary and open-source MLLMs, revealing their strengths and limitations in CTU tasks. MMFCTUB serves as a valuable resource for the research community, facilitating rigorous evaluation of MLLMs in the domain of CTU.
Problem

Research questions and friction points this paper is trying to address.

credit table understanding
multi-modal language models
financial credit review
benchmark evaluation
data annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal language models
credit table understanding
financial benchmark
mask-and-recovery strategy
weakly supervised pipeline
πŸ”Ž Similar Papers
No similar papers found.
C
Cui Yakun
The Hong Kong University of Science and Technology
Yanting Zhang
Yanting Zhang
Donghua University
Z
Zhu Lei
The Hong Kong University of Science and Technology
J
Jian Xie
DuXiaoman Technology
Z
Zhizhuo Kou
The Hong Kong University of Science and Technology
H
Hang Du
Beijing University of Posts and Telecommunications
Z
Zhenghao Zhu
The Hong Kong University of Science and Technology
Sirui Han
Sirui Han
The Hong Kong University of Science and Technology
Large Language ModelInterdisciplinary Artificial Intelligence