Time Travel: A Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifacts

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the lack of systematic evaluation for large multimodal models (LMMs) in historical cultural heritage understanding and analysis. To this end, we introduce HCMML-Bench—the first domain-specific multimodal benchmark—comprising 10,250 expert-annotated samples, 266 cultural entity categories, and coverage across 10 major historical regions, supporting multi-task analysis of manuscripts, artworks, inscriptions, and archaeological artifacts. We propose a unified evaluation framework spanning civilizations, historical periods, and tasks (classification, transcription, historical reasoning), integrating humanities expertise with AI-based assessment through fine-grained visual understanding, cross-modal alignment, and historical context modeling. Comprehensive evaluation of mainstream LMMs reveals critical deficiencies in cultural semantic comprehension and historical logical reasoning. We publicly release the dataset and evaluation code to advance computational heritage research and establish a new community standard.

Technology Category

Application Category

📝 Abstract

Understanding historical and cultural artifacts demands human expertise and advanced computational techniques, yet the process remains complex and time-intensive. While large multimodal models offer promising support, their evaluation and improvement require a standardized benchmark. To address this, we introduce TimeTravel, a benchmark of 10,250 expert-verified samples spanning 266 distinct cultures across 10 major historical regions. Designed for AI-driven analysis of manuscripts, artworks, inscriptions, and archaeological discoveries, TimeTravel provides a structured dataset and robust evaluation framework to assess AI models' capabilities in classification, interpretation, and historical comprehension. By integrating AI with historical research, TimeTravel fosters AI-powered tools for historians, archaeologists, researchers, and cultural tourists to extract valuable insights while ensuring technology contributes meaningfully to historical discovery and cultural heritage preservation. We evaluate contemporary AI models on TimeTravel, highlighting their strengths and identifying areas for improvement. Our goal is to establish AI as a reliable partner in preserving cultural heritage, ensuring that technological advancements contribute meaningfully to historical discovery. Our code is available at: url{https://github.com/mbzuai-oryx/TimeTravel}.

Problem

Research questions and friction points this paper is trying to address.

Evaluates multimodal models on historical artifacts

Provides benchmark for AI in cultural analysis

Assists AI in heritage preservation and discovery

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark for historical analysis

Structured dataset with expert-verified samples

AI evaluation framework for cultural preservation

🔎 Similar Papers

Have Large Vision-Language Models Mastered Art History?