π€ AI Summary
Current multimodal large language models (MLLMs) suffer from limited logical reasoning capabilities on complex charts due to the scarcity of high-quality, fine-grained reasoning data; prompt-based generation methods struggle to simultaneously ensure accuracy and diversity. To address this, we propose a function-chain-driven data synthesis paradigm: atomic function chains (e.g., extremum extraction, arithmetic operations) are programmatically enumerated to automatically construct diverse, precise, and interpretable fine-grained reasoning paths, which are then converted into natural-language question-answer pairs using a lightweight open-source LLM. This approach avoids reliance on massive models, mitigates hallucination, and natively supports attribution analysis. Based on it, we introduce ChartCoFβa dataset comprising 1.4k fine-grained reasoning chains and 50k augmented QA pairs. ChartCoF significantly improves MLLM performance on mainstream chart reasoning benchmarks at comparable scale and, for the first time, systematically reveals model capability disparities across distinct reasoning types.
π Abstract
Visual reasoning is crucial for multimodal large language models (MLLMs) to address complex chart queries, yet high-quality rationale data remains scarce. Existing methods leveraged (M)LLMs for data generation, but direct prompting often yields limited precision and diversity. In this paper, we propose extit{Chain of Functions (CoF)}, a novel programmatic reasoning data generation pipeline that utilizes freely-explored reasoning paths as supervision to ensure data precision and diversity. Specifically, it starts with human-free exploration among the atomic functions (e.g., maximum data and arithmetic operations) to generate diverse function chains, which are then translated into linguistic rationales and questions with only a moderate open-sourced LLM. extit{CoF} provides multiple benefits: 1) Precision: function-governed generation reduces hallucinations compared to freeform generation; 2) Diversity: enumerating function chains enables varied question taxonomies; 3) Explainability: function chains serve as built-in rationales, allowing fine-grained evaluation beyond overall accuracy; 4) Practicality: eliminating reliance on extremely large models. Employing extit{CoF}, we construct the extit{ChartCoF} dataset, with 1.4k complex reasoning Q&A for fine-grained analysis and 50k Q&A for reasoning enhancement. The fine-grained evaluation on extit{ChartCoF} reveals varying performance across question taxonomies for each MLLM, and the experiments also show that finetuning with extit{ChartCoF} achieves state-of-the-art performance among same-scale MLLMs on widely used benchmarks. Furthermore, the novel paradigm of function-governed rationale generation in extit{CoF} could inspire broader applications beyond charts.