🤖 AI Summary
To address high GPU costs, volatile spot instance prices and availability, and stringent deadline constraints in large language model fine-tuning, this paper proposes a deadline-aware online scheduling framework. Methodologically, it first reveals short-term predictability of spot market prices and resource availability; designs a prediction-driven resource allocation algorithm based on commitment levels; incorporates a prediction-free fallback mechanism for robustness; and develops an adaptive policy selection module with an $O(sqrt{T})$ theoretical regret bound. The framework integrates integer programming modeling, error-sensitivity analysis, and heterogeneous instance co-scheduling to jointly optimize cost, timeliness, and reliability under dynamic market conditions. Experiments demonstrate up to 54.8% utility improvement over baseline methods and yield a tighter performance bound dependent on prediction error.
📝 Abstract
As foundation models grow in size, fine-tuning them becomes increasingly expensive. While GPU spot instances offer a low-cost alternative to on-demand resources, their volatile prices and availability make deadline-aware scheduling particularly challenging. We tackle this difficulty by using a mix of spot and on-demand instances. Distinctively, we show the predictability of prices and availability in a spot instance market, the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors. An integer programming problem is formulated to capture the use of mixed instances under both the price and availability dynamics. We propose an online allocation algorithm with prediction based on the committed horizon control approach that leverages a emph{commitment level} to enforce the partial sequence of decisions. When this prediction becomes inaccurate, we further present a complementary online algorithm without predictions. An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms. We prove that the prediction-based algorithm achieves tighter performance bounds as prediction error decreases, while the policy selection algorithm possesses a regret bound of $mathcal{O}(sqrt{T})$. Experimental results demonstrate that our online framework can adaptively select the best policy under varying spot market dynamics and prediction quality, consistently outperforming baselines and improving utility by up to 54.8%.