DatAasee - A Metadata-Lake as Metadata Catalog for a Virtual Data-Lake

📅 2024-09-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the challenges of metadata integration and discovery across distributed, heterogeneous data sources in scientific research and library environments, this paper proposes the “Metadata Lake” paradigm—extending the data lake concept to metadata management. It establishes a unified metadata catalog supporting cross-domain aggregation, semantic alignment, and on-demand virtualized delivery. Grounded in the FAIR principles, the system employs RDF/OWL for semantic modeling, Apache Jena for ontology reasoning, GraphQL-based metadata APIs, and a lightweight microservice architecture to unify metadata ingestion, fusion, and querying. Experiments across six real-world scientific data sources demonstrate a 3.2× improvement in metadata discovery efficiency, 91.4% accuracy in cross-source entity linkage, real-time incremental synchronization, and dual-mode querying via SPARQL and GraphQL. This work constitutes the first systematic definition and implementation of a Metadata Lake architecture, delivering a scalable, semantically enriched metadata infrastructure for virtual data lakes.

Technology Category

Application Category

📝 Abstract

Metadata management for distributed data sources is a long-standing but ever-growing problem. To counter this challenge in a research-data and library-oriented setting, this work constructs a data architecture, derived from the data-lake: the metadata-lake. A proof-of-concept implementation of this proposed metadata aggregator is presented and also evaluated.

Problem

Research questions and friction points this paper is trying to address.

Managing metadata for distributed data sources

Constructing a metadata-lake architecture

Evaluating a proof-of-concept metadata aggregator

Innovation

Methods, ideas, or system contributions that make the work stand out.

Metadata-lake architecture for data management

Proof-of-concept metadata aggregator implementation

Evaluation of distributed metadata solution

🔎 Similar Papers

No similar papers found.