π€ AI Summary
This study addresses the absence of benchmarks and models capable of supporting multilingual, multimodal financial numerical reasoning and question answering for Indic languagesβa critical gap that undermines reliable high-stakes financial decision-making. To bridge this gap, the authors introduce FinVQA, the first benchmark encompassing six Indian languages with 18,900 samples spanning 14 financial domains and four question types. They further propose FIND, a novel framework integrating supervised fine-tuning, constraint-aware decoding, and multimodal alignment mechanisms to simultaneously ensure numerical precision and cross-lingual semantic consistency. Experimental results demonstrate that FIND substantially outperforms existing baselines on multilingual multimodal financial QA tasks.
π Abstract
Financial decision-making in multilingual settings demands accurate numerical reasoning grounded in diverse modalities, yet existing benchmarks largely overlook this high-stakes, real-world challenge, especially for Indic languages. We introduce FinVQA, a benchmark for evaluating financial numerical and multimodal reasoning in multilingual Indic contexts. FinVQA spans English, Hindi, Bengali, Marathi, Gujarati, and Tamil, and comprises 18,900 samples across 14 financial domains. The dataset captures diverse reasoning paradigms under realistic constraints, and is structured across three difficulty levels (easy, moderate, hard) and four question formats: multiple choice, fill-in-the-blank, table matching, and true/false. To address these challenges, we propose FIND, a framework that combines supervised fine-tuning with constraint-aware decoding to promote faithful numerical reasoning, robust multimodal grounding, and structured decision-making. Together, FinVQA and FIND establish a rigorous evaluation and modeling paradigm for high-stakes multilingual multimodal financial reasoning.