π€ AI Summary
This work addresses the lack of high-quality multimodal benchmarks for financial credit table understanding, a field hindered by data inconsistency, high annotation costs, and misaligned evaluation metrics. To bridge this gap, we present the first multimodal benchmark comprising over 7,600 samples across five distinct table types. We introduce a weakly supervised construction approach that enforces constraint preservation and distributional consistency, alongside capability-driven question design and a mask-recovery strategy to evaluate modelsβ abilities in cross-table structural awareness, domain knowledge integration, and numerical reasoning. Comprehensive evaluations of leading multimodal large language models reveal their strengths and limitations in structural comprehension and logical inference, establishing a reliable benchmark and evaluation paradigm for future research.
π Abstract
The advent of multi-modal language models (MLLMs) has spurred research into their application across various table understanding tasks. However, their performance in credit table understanding (CTU) for financial credit review remains largely unexplored due to the following barriers: low data consistency, high annotation costs stemming from domain-specific knowledge and complex calculations, and evaluation paradigm gaps between benchmark and real-world scenarios. To address these challenges, we introduce MMFCTUB (Multi-Modal Financial Credit Table Understanding Benchmark), a practical benchmark, encompassing more than 7,600 high quality CTU samples across 5 table types. MMFCTUB employ a minimally supervised pipeline that adheres to inter-table constraints and maintains data distributions consistency. The benchmark leverages capacity-driven questions and mask-and-recovery strategy to evaluate models'cross-table structure perception, domain knowledge utilization, and numerical calculation capabilities. Utilizing MMFCTUB, we conduct comprehensive evaluations of both proprietary and open-source MLLMs, revealing their strengths and limitations in CTU tasks. MMFCTUB serves as a valuable resource for the research community, facilitating rigorous evaluation of MLLMs in the domain of CTU.