π€ AI Summary
Evaluating visual consistency, spatial alignment, and stylistic integrity in multilingual advertising localization remains inefficient and challenging. Method: This paper introduces the first end-to-end human-in-the-loop evaluation framework, uniquely integrating scene text detection, image inpainting, neural machine translation, and text re-composition to enable automatic visual reconstruction and multidimensional quality assessment of localized advertisements. Contribution/Results: The framework supports adaptation for six languages across six local markets, preserving semantic fidelity while ensuring typographic appropriateness, visual coherence, and stylistic uniformity. Experiments demonstrate substantial improvements in evaluation efficiency; outputs are production-ready and directly integrable into industrial advertising workflows. Results validate the frameworkβs practicality, scalability, and cross-lingual robustness.
π Abstract
Adapting advertisements for multilingual audiences requires more than simple text translation; it demands preservation of visual consistency, spatial alignment, and stylistic integrity across diverse languages and formats. We introduce a structured framework that combines automated components with human oversight to address the complexities of advertisement localization. To the best of our knowledge, this is the first work to integrate scene text detection, inpainting, machine translation (MT), and text reimposition specifically for accelerating ad localization evaluation workflows. Qualitative results across six locales demonstrate that our approach produces semantically accurate and visually coherent localized advertisements, suitable for deployment in real-world workflows.