DatAasee - A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

📅 2024-09-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
To address the challenges of metadata integration and discovery across distributed, heterogeneous data sources in scientific research and library environments, this paper proposes the “Metadata Lake” paradigm—extending the data lake concept to metadata management. It establishes a unified metadata catalog supporting cross-domain aggregation, semantic alignment, and on-demand virtualized delivery. Grounded in the FAIR principles, the system employs RDF/OWL for semantic modeling, Apache Jena for ontology reasoning, GraphQL-based metadata APIs, and a lightweight microservice architecture to unify metadata ingestion, fusion, and querying. Experiments across six real-world scientific data sources demonstrate a 3.2× improvement in metadata discovery efficiency, 91.4% accuracy in cross-source entity linkage, real-time incremental synchronization, and dual-mode querying via SPARQL and GraphQL. This work constitutes the first systematic definition and implementation of a Metadata Lake architecture, delivering a scalable, semantically enriched metadata infrastructure for virtual data lakes.

Technology Category

Application Category

📝 Abstract
Metadata management for distributed data sources is a long-standing but ever-growing problem. To counter this challenge in a research-data and library-oriented setting, this work constructs a data architecture, derived from the data-lake: the metadata-lake. A proof-of-concept implementation of this proposed metadata aggregator is presented and also evaluated.
Problem

Research questions and friction points this paper is trying to address.

Managing metadata for distributed data sources
Constructing a metadata-lake architecture
Evaluating a proof-of-concept metadata aggregator
Innovation

Methods, ideas, or system contributions that make the work stand out.

Metadata-lake architecture for data management
Proof-of-concept metadata aggregator implementation
Evaluation of distributed metadata solution
🔎 Similar Papers
No similar papers found.