Multi-Modal Semantic Parsing for the Interpretation of Tombstone Inscriptions

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Tombstones encapsulate individual biographies, collective memory, and artistic heritage, yet face severe information loss due to natural weathering, human vandalism, and sociopolitical transformations. To address this, we propose the first multimodal large language model framework specifically designed for tombstone understanding. Our method integrates a vision-language model (VLM) to perform end-to-end parsing from tombstone imagery to structured Tombstone Meaning Representations (TMR), augmented by retrieval-augmented generation (RAG) that incorporates external knowledge—such as toponyms and occupational terms—to enhance cross-cultural and multilingual semantic interpretation. Compared to conventional OCR-based approaches, our framework achieves a substantial F1-score improvement—from 36.1 to 89.5—and demonstrates strong robustness under simulated physical degradation. This work establishes a scalable technical paradigm for the digital preservation, semantic organization, and intelligent retrieval of endangered tombstone heritage.

Technology Category

Application Category

📝 Abstract

Tombstones are historically and culturally rich artifacts, encapsulating individual lives, community memory, historical narratives and artistic expression. Yet, many tombstones today face significant preservation challenges, including physical erosion, vandalism, environmental degradation, and political shifts. In this paper, we introduce a novel multi-modal framework for tombstones digitization, aiming to improve the interpretation, organization and retrieval of tombstone content. Our approach leverages vision-language models (VLMs) to translate tombstone images into structured Tombstone Meaning Representations (TMRs), capturing both image and text information. To further enrich semantic parsing, we incorporate retrieval-augmented generation (RAG) for integrate externally dependent elements such as toponyms, occupation codes, and ontological concepts. Compared to traditional OCR-based pipelines, our method improves parsing accuracy from an F1 score of 36.1 to 89.5. We additionally evaluate the model's robustness across diverse linguistic and cultural inscriptions, and simulate physical degradation through image fusion to assess performance under noisy or damaged conditions. Our work represents the first attempt to formalize tombstone understanding using large vision-language models, presenting implications for heritage preservation.

Problem

Research questions and friction points this paper is trying to address.

Digitizing tombstones to preserve historical and cultural data

Improving interpretation of tombstone content using multi-modal models

Enhancing parsing accuracy for damaged or diverse inscriptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vision-language models for tombstone digitization

Incorporates retrieval-augmented generation for semantic parsing

Improves parsing accuracy significantly over OCR methods

🔎 Similar Papers

No similar papers found.