Design and Implementation of a Serverless MapReduce Framework for Scalable Data Pipelines

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work proposes a serverless real-time data processing framework that integrates the MapReduce programming model with Function-as-a-Service (FaaS) to address the demand for efficient handling of massive, real-time data streams in modern logistics systems. Built on Kubernetes and Knative, the framework employs an event-driven architecture composed of five loosely coupled services for data ingestion, aggregation, and analysis. It leverages Apache Kafka for event transport, Redis for metadata management, and AWS S3 for durable storage. By innovatively combining MapReduce’s batch-processing semantics with FaaS’s elastic scaling capabilities, the system supports on-demand autoscaling and scale-to-zero functionality. Experimental results demonstrate that the proposed approach achieves low-latency processing while significantly improving resource efficiency, thereby fulfilling the requirements of highly elastic and highly available data pipelines.

📝 Abstract

Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, processing, and analysis of data have become vital for monitoring operations, optimizing efficiency, and responding quickly to decision making tasks. In this paper, an event-driven MapReduce framework for real-time data processing in logistics environments is presented. This system runs on Kubernetes with Knative and utilizes Apache Kafka as the backbone for communication between the components. This platform is composed of five loosely coupled services that receive, process, and aggregate the incoming data in real-time. Redis is used to preserve workflow metadata, while an AWS S3 service provides persistent storage for the framework. The design is inspired by the MapReduce programming model. It integrates Function-as-a-Service (FaaS) principles with distributed processing techniques that allow configurable scaling based on the workload demands and the underlying hardware. Experimental evaluation shows that the system can scale effectively as the input data volume increases while supporting scale-to-zero, on-demand processing.

Problem

Research questions and friction points this paper is trying to address.

real-time data processing

logistics systems

data aggregation

scalable data pipelines

event-driven processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Serverless

MapReduce

Event-driven