FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback

📅 2023-07-20
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
Existing scientific chart captioning methods rely on document-extracted chart-caption pairs for training, resulting in significant misalignment with reader preferences regarding helpfulness, interpretability, and visual descriptiveness. This paper introduces the first RLHF-driven chart caption generation framework specifically designed for scientific charts, leveraging domain-expert feedback to optimize caption quality. Our key contributions are: (1) establishing a scientific-chart-specific RLHF paradigm; (2) designing an automated chart-caption quality evaluator; and (3) releasing SciCap-HF—the first large-scale scientific chart–caption benchmark annotated with human feedback. Fine-tuning BLIP via RLHF yields substantial improvements: ROUGE, BLEU, and METEOR scores increase by 35.7%, 16.9%, and 9.0%, respectively—demonstrating marked gains in caption utility and interpretability. Both code and dataset are publicly released.
📝 Abstract
Captions are crucial for understanding scientific visualizations and documents. Existing captioning methods for scientific figures rely on figure-caption pairs extracted from documents for training, many of which fall short with respect to metrics like helpfulness, explainability, and visual-descriptiveness [15] leading to generated captions being misaligned with reader preferences. To enable the generation of high-quality figure captions, we introduce FigCaps-HF a new framework for figure-caption generation that can incorporate domain expert feedback in generating captions optimized for reader preferences. Our framework comprises of 1) an automatic method for evaluating quality of figure-caption pairs, 2) a novel reinforcement learning with human feedback (RLHF) method to optimize a generative figure-to-caption model for reader preferences. We demonstrate the effectiveness of our simple learning framework by improving performance over standard fine-tuning across different types of models. In particular, when using BLIP as the base model, our RLHF framework achieves a mean gain of 35.7%, 16.9%, and 9% in ROUGE, BLEU, and Meteor, respectively. Finally, we release a large-scale benchmark dataset with human feedback on figure-caption pairs to enable further evaluation and development of RLHF techniques for this problem.
Problem

Research questions and friction points this paper is trying to address.

Generating high-quality captions for scientific figures
Aligning captions with reader preferences and metrics
Incorporating human feedback to optimize caption generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework integrating expert feedback for captions
Reinforcement learning optimizes caption generation
Automatic evaluation of figure-caption quality
🔎 Similar Papers
No similar papers found.
A
Ashish Singh
CICS, University of Massachusetts Amherst
P
Prateek R. Agarwal
CICS, University of Massachusetts Amherst
Z
Zixuan Huang
CICS, University of Massachusetts Amherst
A
Arpita Singh
CICS, University of Massachusetts Amherst
Tong Yu
Tong Yu
Adobe Research
Sungchul Kim
Sungchul Kim
Adobe
Data miningMachine learningBioinformatics
Victor S. Bursztyn
Victor S. Bursztyn
Research Scientist, Adobe.
Conversational Recommender SystemsConversational SystemsNatural Language Processing
N
N. Vlassis
Adobe Research
Ryan A. Rossi
Ryan A. Rossi
Adobe Research
Machine LearningPersonalizationGraph Representation LearningGraph MLGraph Theory