OrionBench: A Benchmark for Chart and Human-Recognizable Object Detection in Infographics

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current vision-language models (VLMs) exhibit insufficient fine-grained localization capability for chart structures and human-recognizable objects (HROs) in infographics, severely limiting infographic understanding performance. To address this, we introduce InfographicDet—the first fine-grained detection benchmark explicitly designed for charts and HROs in infographics—and propose “Thinking-with-Boxes,” a novel reasoning paradigm that deeply integrates high-precision object detection into the infographic comprehension pipeline. Our methodology innovatively combines model-in-the-loop annotation, programmatically synthesized data generation, bounding-box-supervised learning, and spatial-semantic alignment prompt enhancement. The benchmark comprises 105,000 infographics and over 6.9 million high-quality bounding box annotations. Experiments demonstrate substantial improvements in VLM accuracy on chart question answering, with successful cross-task transfer to document layout analysis and UI element detection.

Technology Category

Application Category

📝 Abstract
Given the central role of charts in scientific, business, and communication contexts, enhancing the chart understanding capabilities of vision-language models (VLMs) has become increasingly critical. A key limitation of existing VLMs lies in their inaccurate visual grounding of infographic elements, including charts and human-recognizable objects (HROs) such as icons and images. However, chart understanding often requires identifying relevant elements and reasoning over them. To address this limitation, we introduce OrionBench, a benchmark designed to support the development of accurate object detection models for charts and HROs in infographics. It contains 26,250 real and 78,750 synthetic infographics, with over 6.9 million bounding box annotations. These annotations are created by combining the model-in-the-loop and programmatic methods. We demonstrate the usefulness of OrionBench through three applications: 1) constructing a Thinking-with-Boxes scheme to boost the chart understanding performance of VLMs, 2) comparing existing object detection models, and 3) applying the developed detection model to document layout and UI element detection.
Problem

Research questions and friction points this paper is trying to address.

Improving chart and object detection in infographics for VLMs
Addressing inaccurate visual grounding of infographic elements
Creating a benchmark for accurate object detection models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines model-in-the-loop and programmatic annotation methods
Introduces OrionBench with 105k infographics and 6.9M annotations
Enables Thinking-with-Boxes for VLM chart understanding enhancement
🔎 Similar Papers
2024-09-07International Conference on Pattern RecognitionCitations: 2
J
Jiangning Zhu
School of Software, BNRist, Tsinghua University
Yuxing Zhou
Yuxing Zhou
School of Software, BNRist, Tsinghua University
Z
Zheng Wang
School of Software, BNRist, Tsinghua University
J
Juntao Yao
School of Software, BNRist, Tsinghua University
Y
Yima Gu
School of Software, BNRist, Tsinghua University
Yuhui Yuan
Yuhui Yuan
Canva CORE, ex-Microsoft Research Asia
Generative AI + DesignComputer Vision
Shixia Liu
Shixia Liu
Professor, Tsinghua University, IEEE Fellow
interactive machine learningData-Centric AIvisual analytics