Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the low-resource challenge in image captioning for Indigenous American languages—specifically Bribri, Guaraní, and Orizaba Nahuatl—by proposing a two-stage approach. First, Qwen2.5-VL generates intermediate captions in Spanish; then, a language-specific retrieval-augmented multi-example prompting strategy leverages Gemini 2.5 Flash to produce captions in the target languages. The method innovatively integrates large-scale in-domain corpora with synthetic data augmentation, yielding substantial performance gains. On the development set, it achieves improvements of 164.1% (Bribri), 131.7% (Guaraní), and 122.6% (Orizaba Nahuatl) over the baseline, with test set results sustaining gains exceeding 150%. This approach secured first place in the shared task and ranked second in human evaluation.

📝 Abstract

We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a Spanish intermediate caption with Qwen2.5-VL, then produces the target-language caption using retrieval-augmented many-shot prompting with Gemini 2.5 Flash. We achieve 164.1%, 131.7%, and 122.6% improvements over the shared task baseline for Bribri, Guaraní, and Orizaba Nahuatl captioning, respectively, in our dev set evaluation and maintain >150% improvements for the Bribri and Orizaba Nahuatl languages in the test set evaluation. We find retrieval is highly language-dependent, beneficial only for large, in-domain corpora, and that synthetic data augmentation accounts for around 28 chrF++ of the dev set Guaraní performance gain. Our submission is the overall winner of the shared task, placing second out of five finalist submissions in human evaluations of target-language captions.

Problem

Research questions and friction points this paper is trying to address.

cultural image captioning

Indigenous languages

long-context translation

low-resource languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation

many-shot prompting

cultural image captioning

low-resource languages