What are Foundation Models Cooking in the Post-Soviet World?

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work exposes systematic cultural biases in foundation models regarding post-Soviet culinary knowledge: models exhibit significant deficiencies in cuisine provenance question answering and visual dish description generation. To address this, we introduce BORSch—the first bilingual (Russian/Ukrainian) multimodal dish dataset (1,147/823 dishes)—enabling the first systematic evaluation of mainstream foundation models’ cultural knowledge representation in this domain. We propose a comprehensive evaluation framework integrating cross-lingual QA, pretraining data provenance analysis, and visual description consistency verification. Our analysis reveals strong language-binding biases—e.g., Russian-language queries disproportionately trigger erroneous attribution to Russia—and weak correlation between visual description accuracy and QA performance, demonstrating that QA-only assessment is insufficient for measuring cultural understanding. These findings establish a new benchmark and methodological foundation for culturally aware multimodal model evaluation.

Technology Category

Application Category

📝 Abstract
The culture of the Post-Soviet states is complex, shaped by a turbulent history that continues to influence current events. In this study, we investigate the Post-Soviet cultural food knowledge of foundation models by constructing BORSch, a multimodal dataset encompassing 1147 and 823 dishes in the Russian and Ukrainian languages, centered around the Post-Soviet region. We demonstrate that leading models struggle to correctly identify the origins of dishes from Post-Soviet nations in both text-only and multimodal Question Answering (QA), instead over-predicting countries linked to the language the question is asked in. Through analysis of pretraining data, we show that these results can be explained by misleading dish-origin co-occurrences, along with linguistic phenomena such as Russian-Ukrainian code mixing. Finally, to move beyond QA-based assessments, we test models' abilities to produce accurate visual descriptions of dishes. The weak correlation between this task and QA suggests that QA alone may be insufficient as an evaluation of cultural understanding. To foster further research, we will make BORSch publicly available at https://github.com/alavrouk/BORSch.
Problem

Research questions and friction points this paper is trying to address.

Analyzing Post-Soviet cultural food knowledge in AI.
Evaluating AI's accuracy in identifying dish origins.
Assessing AI's ability in visual dish descriptions.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal dataset construction
Over-prediction analysis
Visual description testing
🔎 Similar Papers
No similar papers found.