ATD-Trans: A Geographically Grounded Japanese-English Travelogue Translation Dataset

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This study addresses the lack of fine-grained, geographically grounded evaluation benchmarks for machine translation of multilingual geographic texts, such as travelogues. Focusing on the Japanese–English language pair, we introduce ATD-Trans, the first parallel travelogue corpus annotated with geographic entities. This dataset enables translation quality assessment at both the document and geographic entity levels, while distinguishing performance on domestic (Japan-based) versus international locations. Building upon this resource, we propose an evaluation framework that integrates geographic entity recognition and cross-lingual alignment. Comparative experiments reveal that models optimized for Japanese achieve superior overall performance, yet struggle more with accurately translating domestic geographic entities. Our work establishes a foundational benchmark for multilingual geographic text processing and offers a novel paradigm for fine-grained cross-lingual geographic understanding.

📝 Abstract

Geographic text, or textual data rich in geographic (geo-) information is a valuable source for various geographic applications, e.g., tourism management. Making such information accessible to speakers of other languages further enhances its utility; thus, accurate machine translation (MT) is essential for equity in multilingual geo-information access. To facilitate in-depth analysis for geographic text, we introduce ATD-Trans, a geographically grounded Japanese--English travelogue translation dataset, which enables evaluation of MT quality at both the overall and geo-entity levels across domestic (within Japan) and overseas regions. Our experiments on existing language models examine two factors: model language focus and geographic regions. The results highlight advantages of Japanese-enhanced models and greater difficulty in translating domestic-region geo-entities mentioned in travel blogs.

Problem

Research questions and friction points this paper is trying to address.

geographic text

machine translation

Japanese-English translation

geo-entity

travelogue

Innovation

Methods, ideas, or system contributions that make the work stand out.

geographically grounded translation

Japanese-English travelogue dataset

geo-entity translation