Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
This work addresses the low-resource challenge in image captioning for Indigenous American languages—specifically Bribri, Guaraní, and Orizaba Nahuatl—by proposing a two-stage approach. First, Qwen2.5-VL generates intermediate captions in Spanish; then, a language-specific retrieval-augmented multi-example prompting strategy leverages Gemini 2.5 Flash to produce captions in the target languages. The method innovatively integrates large-scale in-domain corpora with synthetic data augmentation, yielding substantial performance gains. On the development set, it achieves improvements of 164.1% (Bribri), 131.7% (Guaraní), and 122.6% (Orizaba Nahuatl) over the baseline, with test set results sustaining gains exceeding 150%. This approach secured first place in the shared task and ranked second in human evaluation.
📝 Abstract
We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a Spanish intermediate caption with Qwen2.5-VL, then produces the target-language caption using retrieval-augmented many-shot prompting with Gemini 2.5 Flash. We achieve 164.1%, 131.7%, and 122.6% improvements over the shared task baseline for Bribri, Guaraní, and Orizaba Nahuatl captioning, respectively, in our dev set evaluation and maintain >150% improvements for the Bribri and Orizaba Nahuatl languages in the test set evaluation. We find retrieval is highly language-dependent, beneficial only for large, in-domain corpora, and that synthetic data augmentation accounts for around 28 chrF++ of the dev set Guaraní performance gain. Our submission is the overall winner of the shared task, placing second out of five finalist submissions in human evaluations of target-language captions.
Problem

Research questions and friction points this paper is trying to address.

cultural image captioning
Indigenous languages
long-context translation
low-resource languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation
many-shot prompting
cultural image captioning
low-resource languages
two-stage translation pipeline
🔎 Similar Papers
2024-02-08International Joint Conference on Artificial IntelligenceCitations: 3