🤖 AI Summary
To address the challenge of simultaneously ensuring content accuracy and preserving authorial writing style in scientific chart caption generation, this paper proposes a two-stage generative framework. In the first stage, DSPy (specifically MIPROv2 and SIMBA) is employed for multimodal context filtering and category-specific prompt optimization. In the second stage, few-shot stylistic fine-tuning is performed using author-profile graphs derived from the LaMP-Cap dataset. This work represents the first approach to jointly enhance both factual fidelity and stylistic consistency in scientific image captioning. Experimental results demonstrate that category-specific prompting improves ROUGE-1 recall by 8.3%; integrating stylistic fine-tuning further boosts BLEU by 40–48% and ROUGE-L by 25–27%, significantly outperforming zero-shot and generic prompt-optimization baselines.
📝 Abstract
Scientific figure captions require both accuracy and stylistic consistency to convey visual information. Here, we present a domain-specific caption generation system for the 3rd SciCap Challenge that integrates figure-related textual context with author-specific writing styles using the LaMP-Cap dataset. Our approach uses a two-stage pipeline: Stage 1 combines context filtering, category-specific prompt optimization via DSPy's MIPROv2 and SIMBA, and caption candidate selection; Stage 2 applies few-shot prompting with profile figures for stylistic refinement. Our experiments demonstrate that category-specific prompts outperform both zero-shot and general optimized approaches, improving ROUGE-1 recall by +8.3% while limiting precision loss to -2.8% and BLEU-4 reduction to -10.9%. Profile-informed stylistic refinement yields 40--48% gains in BLEU scores and 25--27% in ROUGE. Overall, our system demonstrates that combining contextual understanding with author-specific stylistic adaptation can generate captions that are both scientifically accurate and stylistically faithful to the source paper.