Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the coupled optimization challenge of CPU resource allocation and local model training in distributed machine learning. We propose a time-invariant feasible co-optimization framework that jointly models computational resource scheduling and distributed training dynamics, supporting time-varying communication topologies and logarithmic quantization for inter-node information exchange. By integrating perturbation analysis, Lyapunov stability theory, and spectral graph analysis—alongside distributed SVM and regression models—the framework guarantees both resource supply-demand balance and consensus convergence. Compared to state-of-the-art CPU scheduling approaches, our method reduces the cost optimality gap by over 50%, significantly improving computational efficiency and resource utilization. Theoretical convergence is rigorously established, providing a provably convergent, practical solution for joint optimization under dynamic network conditions.

Technology Category

Application Category

📝 Abstract
In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine learning (ML) and optimization is considered in this paper. Given a set of data distributed over a network of computing-nodes/servers, the idea is to optimally assign the CPU (central processing unit) usage while simultaneously training each computing node locally via its own share of data. This formulates the problem as a co-optimization setup to (i) optimize the data processing and (ii) optimally allocate the computing resources. The information-sharing network among the nodes might be time-varying, but with balanced weights to ensure consensus-type convergence of the algorithm. The algorithm is all-time feasible, which implies that the computing resource-demand balance constraint holds at all iterations of the proposed solution. Moreover, the solution allows addressing possible log-scale quantization over the information-sharing channels to exchange log-quantized data. For some example applications, distributed support-vector-machine (SVM) and regression are considered as the ML training models. Results from perturbation theory, along with Lyapunov stability and eigen-spectrum analysis, are used to prove the convergence towards the optimal case. As compared to existing CPU scheduling solutions, the proposed algorithm improves the cost optimality gap by more than $50%$.
Problem

Research questions and friction points this paper is trying to address.

Optimizing CPU scheduling for distributed machine learning training
Co-optimizing data processing and computing resource allocation
Ensuring algorithm convergence with time-varying network connections
Innovation

Methods, ideas, or system contributions that make the work stand out.

Co-optimizing CPU scheduling with distributed machine learning training
Using balanced time-varying networks for consensus algorithm convergence
Incorporating log-scale quantization for efficient data exchange
🔎 Similar Papers
No similar papers found.