Chart-FR1: Visual Focus-Driven Fine-Grained Reasoning on Dense Charts

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This work addresses the limitations of existing multimodal large language models in interpreting high-information-density charts, particularly their insufficient fine-grained perception, susceptibility to redundant visual distractions, and lack of adaptive deep reasoning capabilities. To overcome these challenges, the authors propose Chart-FR1, a novel model that integrates Focus-CoT—a visually grounded chain-of-thought mechanism—with Focus-GRPO, a focus-driven reinforcement learning algorithm. Chart-FR1 leverages OCR and localized image region analysis for fine-grained visual understanding, incorporates an information-efficiency reward mechanism, and employs an adaptive KL penalty strategy to enhance reasoning precision and efficiency. The study also introduces HID-Chart, the first benchmark specifically designed for evaluating high-information-density chart comprehension. Experimental results demonstrate that Chart-FR1 significantly outperforms state-of-the-art multimodal models across multiple chart understanding tasks, confirming its superior capacity for fine-grained inference and adaptive visual focus.
📝 Abstract
Multimodal large language models (MLLMs) have shown considerable potential in chart understanding and reasoning tasks. However, they still struggle with high information density (HID) charts characterized by multiple subplots, legends, and dense annotations due to three major challenges: (1) limited fine-grained perception results in the omission of critical visual cues; (2) redundant or noisy visual information undermines the performance of multimodal reasoning; (3) lack of adaptive deep reasoning relative to the amount of visual information. To tackle these challenges, we present a novel focus-driven fine-grained chart reasoning model, Chart-FR1, to improve perception, focusing efficiency, and adaptive deep reasoning on HID charts. Specifically, we propose Focus-CoT, a visual focusing chain-of-thought that enhances fine-grained perception by explicitly linking reasoning steps to key visual cues, such as local image regions and OCR signals. Building on this, we introduce Focus-GRPO, a focus-driven reinforcement learning algorithm with an information-efficiency reward that compresses redundant visual information for efficient focusing, and an adaptive KL penalty mechanism that enables flexible control over reasoning depth as more visual cues are discovered. Furthermore, to fill the gap in benchmarks for HID charts, we build HID-Chart, a challenging benchmark with an information-density metric designed to evaluate fine-grained chart reasoning capabilities. Extensive experiments on multiple chart benchmarks demonstrate that Chart-FR1 outperforms state-of-the-art MLLMs in chart understanding and reasoning. Code is available at https://github.com/phkhub/Chart-FR1.
Problem

Research questions and friction points this paper is trying to address.

high information density charts
fine-grained perception
multimodal reasoning
visual focus
adaptive reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focus-CoT
Focus-GRPO
high information density charts
fine-grained reasoning
multimodal reinforcement learning
🔎 Similar Papers
2024-09-07International Conference on Pattern RecognitionCitations: 2