🤖 AI Summary
This work addresses the challenge of achieving both high reliability and cost efficiency in multi-node Spot instance deployments, which are inherently susceptible to unpredictable interruptions. To overcome the limitations of existing single-node approaches, the study proposes a novel recommendation system tailored for multi-node scenarios, leveraging large-scale, publicly available availability data. The system continuously collects real-time Spot availability metrics across multiple regions, analyzes historical interruption patterns, and dynamically optimizes resource allocation accordingly. Experimental results demonstrate that the proposed approach improves system availability by 81.28% and reduces costs by 2.84% compared to SpotVerse. Moreover, it achieves a 21.6% gain in stability and 26.3% cost savings relative to AWS SpotFleet, thereby effectively enabling a synergistic optimization of reliability and affordability in Spot-based computing environments.
📝 Abstract
Cloud vendors offer discounted spot instances to maximize surplus resource utilization, but these instances are subject to the risk of sudden interruption. Traditional pricing datasets have been employed to predict this risk, yet recent policy changes by cloud vendors have diminished their effectiveness. To promote spot instance usage, public cloud vendors provide instant availability datasets to help users mitigate interruption risks. While existing research utilizing this data has proposed methods to reduce interruptions, these studies have primarily focused on single-node instances, overlooking the stability of multi-node environments widely adopted for modern cloud workloads.
This paper proposes SpotVista, a system that recommends a resource pool of reliable and cost-efficient multi-node spot instances by leveraging various publicly available datasets. To achieve this, SpotVista collects a large-scale multi-node availability dataset while overcoming significant query limitations. Through a thorough analysis of multi-node spot instance availability behavior, SpotVista establishes a methodology for recommending cost-efficient and reliable multi-node configurations.
To evaluate how effectively the proposed methodology reflects multi-node availability and cost efficiency, extensive real-world interruption experiments were conducted. The results demonstrate that SpotVista outperforms the state-of-the-art work, SpotVerse, achieving 81.28% greater availability and 2.84\% more cost savings in a multi-region setup. When compared to a publicly available service, AWS SpotFleet, SpotVista provides 21.6\% higher stability and 26.3% greater cost savings.