🤖 AI Summary
This work addresses the low-resource challenge in image captioning for Indigenous American languages—specifically Bribri, Guaraní, and Orizaba Nahuatl—by proposing a two-stage approach. First, Qwen2.5-VL generates intermediate captions in Spanish; then, a language-specific retrieval-augmented multi-example prompting strategy leverages Gemini 2.5 Flash to produce captions in the target languages. The method innovatively integrates large-scale in-domain corpora with synthetic data augmentation, yielding substantial performance gains. On the development set, it achieves improvements of 164.1% (Bribri), 131.7% (Guaraní), and 122.6% (Orizaba Nahuatl) over the baseline, with test set results sustaining gains exceeding 150%. This approach secured first place in the shared task and ranked second in human evaluation.
📝 Abstract
We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a Spanish intermediate caption with Qwen2.5-VL, then produces the target-language caption using retrieval-augmented many-shot prompting with Gemini 2.5 Flash. We achieve 164.1%, 131.7%, and 122.6% improvements over the shared task baseline for Bribri, Guaraní, and Orizaba Nahuatl captioning, respectively, in our dev set evaluation and maintain >150% improvements for the Bribri and Orizaba Nahuatl languages in the test set evaluation. We find retrieval is highly language-dependent, beneficial only for large, in-domain corpora, and that synthetic data augmentation accounts for around 28 chrF++ of the dev set Guaraní performance gain. Our submission is the overall winner of the shared task, placing second out of five finalist submissions in human evaluations of target-language captions.