Chart Deep Research in LVLMs via Parallel Relative Policy Optimization

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing chart intelligence methods are largely confined to shallow tasks, struggling to support complex reasoning and higher-order data analysis. They also suffer from multidimensional reward interference and gradient conflicts arising from heterogeneous training data, while current evaluation frameworks lack quantitative measures for end-to-end analytical capabilities. To address these challenges, this work proposes the Parallel Relative Policy Optimization (PRPO) framework, which mitigates training instability through decoupled multidimensional rewards and capability modules partitioned by data type. Furthermore, we introduce MCDR-Bench, a novel benchmark grounded in the "Principle of Error Uniqueness," which transforms subjective generation into objective error identification via controlled error injection, enabling quantifiable assessment of deep analytical competence. Experiments demonstrate that the synergy between PRPO and MCDR-Bench significantly enhances model performance on complex chart analysis tasks, achieving simultaneous advances in stable training and objective evaluation.

Technology Category

Application Category

📝 Abstract

With the rapid advancement of data science, charts have evolved from simple numerical presentation tools to essential instruments for insight discovery and decision-making support. However, current chart data intelligence exhibits significant limitations in deep research capabilities, with existing methods predominantly addressing shallow tasks such as visual recognition or factual question-answering, rather than the complex reasoning and high-level data analysis that deep research requires. This limitation stems from two primary technical bottlenecks: at the training level, existing post-training techniques exhibit deficiencies in handling multi-dimensional reward signal interference and heterogeneous data gradient conflicts, preventing models from achieving balanced development across multiple capability dimensions; at the evaluation level, current methods remain limited to factual retrieval and basic computation, failing to assess end-to-end analytic reasoning and other deep research capabilities. To address the training challenge, we propose PRPO, which performs parallel optimization across reward dimensions and capability partitioning across data types, effectively disentangling conflicts between heterogeneous data and multi-dimensional reward signals while ensuring optimization stability. For the evaluation challenge, we construct MCDR-Bench based on the ``error uniqueness principle,"transforming subjective generation assessment into objective error identification through controllable error injection, enabling quantifiable evaluation of deep research capabilities. Experimental validation confirms that the proposed PRPO and MCDR-Bench jointly establish a unified framework that systematically advances chart deep research through enhanced collaborative training and objective evaluation.

Problem

Research questions and friction points this paper is trying to address.

chart deep research

LVLMs

multi-dimensional reward

heterogeneous data

analytic reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel Relative Policy Optimization

Chart Deep Research

Multi-dimensional Reward Disentanglement