🤖 AI Summary
This study addresses prolonged task completion times, low resource utilization, and high resource release latency in Docker/Kubernetes containers on cloud-native platforms running compute-intensive workloads (e.g., big data and deep learning). We systematically evaluate the performance impact of diverse resource scheduling strategies through system-level monitoring—leveraging cgroups and metrics-server—and multi-workload stress testing. For the first time, we empirically quantify how key resource configurations significantly affect task completion time (±79.4% variation) and resource release latency (+116.7% degradation). Based on these findings, we propose an evidence-driven configuration optimization paradigm that reduces maximum task completion time by up to 79.4% and precisely identifies configuration bottlenecks responsible for latency. Our results provide reproducible, transferable empirical foundations for resource management tuning and deployment decisions in cloud-native environments.
📝 Abstract
Businesses have made increasing adoption and incorporation of cloud technology into internal processes in the last decade. The cloud-based deployment provides on-demand availability without active management. More recently, the concept of cloud-native application has been proposed and represents an invaluable step toward helping organizations develop software faster and update it more frequently to achieve dramatic business outcomes. Cloud-native is an approach to build and run applications that exploit the cloud computing delivery model's advantages. It is more about how applications are created and deployed than where. The container-based virtualization technology, such as Docker and Kubernetes, serves as the foundation for cloud-native applications. This paper investigates the performance of two popular computational-intensive applications, big data, and deep learning, in a cloud-native environment. We analyze the system overhead and resource usage for these applications. Through extensive experiments, we show that the completion time reduces by up to 79.4% by changing the default setting and increases by up to 96.7% due to different resource management schemes on two platforms. Additionally, the resource release is delayed by up to 116.7% across different systems. Our work can guide developers, administrators, and researchers to better design and deploy their applications by selecting and configuring a hosting platform.