Towards Automatic Evaluation for Image Transcreation

📅 2024-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated quality assessment for cross-cultural image localization remains underexplored. Method: This paper introduces the first systematic automatic evaluation framework covering cultural relevance, semantic equivalence, and visual similarity. We propose three novel metric categories: object-level (detection-based), embedding-level (using frozen vision encoder representations), and VLM-level (leveraging multi-source vision-language models), integrating translation studies theory and transcreation practice. Drawing on machine translation evaluation paradigms and cross-cultural semantic alignment techniques, our approach enables multi-dimensional, synergistic assessment. Contribution/Results: Evaluated on human-annotated data from seven countries, our framework achieves segment-level correlations of 0.55–0.87. Experiments show VLMs excel at assessing cultural appropriateness and semantic consistency, while frozen vision encoders outperform others in measuring visual similarity. This work establishes the first automated evaluation benchmark for image transcreation, addressing a critical gap in the field.

Technology Category

Application Category

📝 Abstract
Beyond conventional paradigms of translating speech and text, recently, there has been interest in automated transcreation of images to facilitate localization of visual content across different cultures. Attempts to define this as a formal Machine Learning (ML) problem have been impeded by the lack of automatic evaluation mechanisms, with previous work relying solely on human evaluation. In this paper, we seek to close this gap by proposing a suite of automatic evaluation metrics inspired by machine translation (MT) metrics, categorized into: a) Object-based, b) Embedding-based, and c) VLM-based. Drawing on theories from translation studies and real-world transcreation practices, we identify three critical dimensions of image transcreation: cultural relevance, semantic equivalence and visual similarity, and design our metrics to evaluate systems along these axes. Our results show that proprietary VLMs best identify cultural relevance and semantic equivalence, while vision-encoder representations are adept at measuring visual similarity. Meta-evaluation across 7 countries shows our metrics agree strongly with human ratings, with average segment-level correlations ranging from 0.55-0.87. Finally, through a discussion of the merits and demerits of each metric, we offer a robust framework for automated image transcreation evaluation, grounded in both theoretical foundations and practical application. Our code can be found here: https://github.com/simran-khanuja/automatic-eval-transcreation
Problem

Research questions and friction points this paper is trying to address.

Automated Image Quality Assessment
Image Transformation Quality
Machine Evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic Image Translation Evaluation
Visual Language Model
Visual Encoder Technology
🔎 Similar Papers
No similar papers found.