Advancing Polyglot Big Data Processing using the Hadoop ecosystem

📅 2025-04-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of heterogeneous engine coordination, semantic fragmentation, and inefficient scheduling in multilingual big data processing, this paper proposes the first unified polyglot data processing framework built on the Hadoop ecosystem. The framework leverages YARN as its resource management foundation and integrates HDFS, Spark, Flink, Kafka, HBase, and a custom DSL-driven hybrid execution engine. It introduces a novel semantics-aware component coordination mechanism and scenario-adaptive orchestration strategy. Evaluated on real-world smart city and social network workloads, the framework achieves an average 37% reduction in end-to-end latency and a 2.1× improvement in resource utilization—demonstrating the efficacy of cross-engine semantic alignment and dynamic collaborative scheduling. This work establishes a systematic architectural paradigm and provides empirical validation for evolving the Hadoop ecosystem into a unified, multi-paradigm, polyglot data processing platform.

Technology Category

Application Category

📝 Abstract
This article explores the utilization of the Hadoop ecosystem as a polyglot big data processing platform, focusing on the integration of diverse computation and storage technologies and their potential advantages in certain computational contexts. It delves into the potential of this ecosystem as a unified platform highlighting its architectural foundations and their complementary strengths in distributed storage, processing efficiency and real-time analytics. The article explores potential use cases within domains such as Smart Cities and Social Networks, illustrating how the platform's diverse components can be orchestrated in a polyglot manner and how these fields can benefit from the ecosystem's capabilities. Finally, the article concludes by showcasing alternatives for future research, including specialized architectural aspects of the ecosystem to advance the polyglot paradigm.
Problem

Research questions and friction points this paper is trying to address.

Integrating diverse technologies in Hadoop for big data processing
Exploring Hadoop's strengths in storage, efficiency, and real-time analytics
Applying polyglot Hadoop in Smart Cities and Social Networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hadoop ecosystem for polyglot big data processing
Integration of diverse computation and storage technologies
Unified platform for distributed storage and real-time analytics
🔎 Similar Papers
No similar papers found.