Multimodal Cultural Heritage Knowledge Graph Extension with Language and Vision Models

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This study addresses key challenges in constructing cultural heritage knowledge graphs, particularly the difficulties of multimodal data fusion and inefficient extraction of unstructured information. The authors present WJoconde, the first multimodal knowledge graph of French cultural heritage that integrates both textual and visual data. They propose an automated knowledge extraction and validation framework leveraging large language models (LLMs) and vision-language models (VLMs) to enable reliable knowledge expansion. The resulting system significantly improves knowledge graph completion performance. To support further research and reproducibility, the authors publicly release all source code, a multimodal benchmark dataset, and an interactive access platform.

📝 Abstract

The preservation and interpretation of cultural heritage increasingly rely on digital technologies, among which Knowledge Graphs (KGs) stand out for their ability to structure vast amounts of data. However, the construction and expansion of these KGs often face challenges due to the diverse and complex nature of cultural heritage information. In this paper, we propose a novel approach for extending KG resources in the domain of cultural heritage, which we applied to French data. First, we introduce a new knowledge graph in the domain of French cultural heritage, WJoconde, which is distinguished by its multimodality as it integrates both textual and image information of the entities. We further introduce three variants of WJoconde to facilitate downstream research, such as Knowledge Graph Completion (KGC). We also built a comprehensive benchmark for KGC methods on our dataset. Second, we propose a new framework for extending cultural heritage KGs using multi-modal approaches leveraging Large Language Models (LLMs) and Vision-Language Models (VLMs), which includes automated data extraction from unstructured resources combined with a special validation pipeline for grounding the output of both models, to further extend WJoconde. Our results show that by integrating the rich text and image information in cultural heritage data, we can efficiently enhance KGs with high reliability. We open-source all code and benchmark datasets with text and images, as well as the original data with an interactive access point

Problem

Research questions and friction points this paper is trying to address.

Cultural Heritage

Knowledge Graph Extension

Multimodal Data

Knowledge Graph Completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Knowledge Graph

Cultural Heritage

Large Language Models

Vision-Language Models