FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

📅 2023-07-20

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing scientific chart captioning methods rely on document-extracted chart-caption pairs for training, resulting in significant misalignment with reader preferences regarding helpfulness, interpretability, and visual descriptiveness. This paper introduces the first RLHF-driven chart caption generation framework specifically designed for scientific charts, leveraging domain-expert feedback to optimize caption quality. Our key contributions are: (1) establishing a scientific-chart-specific RLHF paradigm; (2) designing an automated chart-caption quality evaluator; and (3) releasing SciCap-HF—the first large-scale scientific chart–caption benchmark annotated with human feedback. Fine-tuning BLIP via RLHF yields substantial improvements: ROUGE, BLEU, and METEOR scores increase by 35.7%, 16.9%, and 9.0%, respectively—demonstrating marked gains in caption utility and interpretability. Both code and dataset are publicly released.

📝 Abstract

Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality captions for scientific figures

Aligning captions with reader preferences and metrics

Incorporating human feedback to optimize caption generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework integrating expert feedback for captions

Reinforcement learning optimizes caption generation

Automatic evaluation of figure-caption quality

🔎 Similar Papers

Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis

2024-08-09Citations: 0