Spot-and-Scoot: Peeking Into Spot Instance Availability

📅 2026-04-08
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
High observation costs hinder effective monitoring of the dynamic availability of cloud spot instances. This work proposes Ding-Dong Ditch, a novel method that leverages the mechanism of immediately canceling spot instance requests upon acceptance to obtain binary availability signals at near-zero runtime cost, while estimating available capacity through concurrent requests. It is the first approach to actively probe availability using early-stage signals from cloud platform scheduling lifecycles, revealing that interruptions of the same instance type are highly synchronized within three minutes. Experiments across 68 instance types and 15 regions on AWS and Azure demonstrate that the method achieves an F1-macro score of 0.90 for current availability modeling, maintaining 0.85 even for 60-minute-ahead predictions. TPC-DS workload simulations further confirm its effectiveness in significantly reducing computational loss.
📝 Abstract
Spot instances offer significant cost savings of up to 90% over on-demand prices, making them an attractive resource for large-scale computing workloads. However, understanding their availability dynamics is essential for building systems that tolerate interruptions, and observing this availability directly requires keeping instances running, which incurs costs that scale with the number of monitored instance types and their per-instance price. We propose Spot-and-Scoot (SnS), a cost-efficient method that collects spot instance availability signals by leveraging the cloud provider's provisioning lifecycle. Since the outcome of a spot request is determined before the instance enters the running state, SnS submits requests and cancels them upon provisioning acceptance, collecting binary availability signals at near-zero instance cost. Submitting multiple concurrent requests per measurement point further yields a quantitative estimate of available capacity. We validate SnS through simultaneous collection of probing signals and actual running instance traces across 68 instance types and 15 regions on both AWS and Azure, totaling 336,033 spot requests. Analysis of 2,635 real-world interruption events reveals that co-interruptions within the same instance type and availability zone occur within three minutes in over 92% of cases, motivating a binary availability formulation. Based on this formulation, we derive three complementary features from SnS signals and demonstrate that their combination achieves an F1-macro score of up to 0.90 for current availability modeling and maintains 0.85 at a 60-minute prediction horizon. A trace-driven simulation using TPC-DS workloads further demonstrates the potential of SnS-based prediction to reduce lost computation compared to an unguided baseline.
Problem

Research questions and friction points this paper is trying to address.

spot instances
availability monitoring
cost-efficient probing
interruption tolerance
cloud computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

spot instances
availability probing
cost-efficient monitoring
interruption prediction
cloud computing
🔎 Similar Papers
2024-05-202024 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)Citations: 0