HERITRACE: a domain-agnostic framework for SHACL-driven RDF curation with provenance and change tracking

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work addresses the challenge that non-experts in Semantic Web technologies face when attempting to edit RDF data efficiently and traceably, as existing tools lack automated provenance tracking for changes. The authors propose a domain-agnostic RDF data governance framework grounded in SHACL, which automatically generates user-friendly form interfaces from YAML configurations and SHACL shapes. These forms connect directly to SPARQL endpoints, enabling in-place editing without data migration. Leveraging RDF’s native graph structure, the system automatically captures a complete, queryable history of all modifications along with their provenance, supporting auditability, rollback, and recovery. This approach delivers, for the first time, a traceable RDF editing experience tailored to non-technical users. The framework is open-source, has been successfully deployed in the ParaText project, and is designed for plug-and-play integration with any SPARQL-accessible RDF store, with planned adoption by platforms such as OpenCitations and GRAPHIA.
📝 Abstract
HERITRACE is an open-source web application that enables users without Semantic Web expertise to curate RDF data through form-based interfaces with automatic provenance documentation and change tracking in RDF. It uses SHACL for data model definition and form generation, connects to existing SPARQL-accessible stores without data migration, and records every modification as a provenance snapshot that can be browsed and restored. HERITRACE is domain-agnostic: adapting it to a new collection requires only SHACL shapes and YAML display rules, without code changes. This paper describes the software architecture and provides the first empirical evaluation. HERITRACE is deployed in production for the ParaText project, where classical philologists curate bibliographic data about ancient Greek exegetical traditions, and is planned as the editing interface for OpenCitations and as the curation layer for the Social Sciences and Humanities Citation Index within the GRAPHIA Horizon Europe project. Since it operates on any SPARQL-accessible store without data migration, its adoption potential extends to any domain maintaining RDF data. HERITRACE is publicly available on GitHub under the ISC license, archived on Zenodo and Software Heritage Archive, and documented for deployment with a pre-built Docker image.
Problem

Research questions and friction points this paper is trying to address.

RDF curation
provenance tracking
change tracking
domain-agnostic
SHACL
Innovation

Methods, ideas, or system contributions that make the work stand out.

SHACL-driven curation
provenance tracking
RDF change management
domain-agnostic framework
form-based RDF editing