LeJOT: An Intelligent Job Cost Orchestration Solution for Databricks Platform

📅 2025-12-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address uncontrolled job cost growth on the Databricks platform, this paper proposes a proactive cost orchestration framework integrating time-series forecasting and constrained optimization. The framework employs an ensemble LSTM/Transformer model to dynamically predict job execution durations and leverages a mixed-integer programming (MIP) solver to generate resource scheduling policies that jointly optimize cloud cost and ensure SLA compliance in real time. These policies are executed via the Databricks Runtime API, enabling closed-loop adaptation within one minute. Unlike static configuration or reactive tuning approaches, our work introduces the first “forecast–optimize–execute” paradigm for real-time, prediction-driven resource adaptation. Evaluated on production workloads, the framework achieves an average 20% reduction in cloud expenditure while maintaining scheduling latency ≤60 seconds—outperforming state-of-the-art baselines across key metrics.

Technology Category

Application Category

📝 Abstract
With the rapid advancements in big data technologies, the Databricks platform has become a cornerstone for enterprises and research institutions, offering high computational efficiency and a robust ecosystem. However, managing the escalating operational costs associated with job execution remains a critical challenge. Existing solutions rely on static configurations or reactive adjustments, which fail to adapt to the dynamic nature of workloads. To address this, we introduce LeJOT, an intelligent job cost orchestration framework that leverages machine learning for execution time prediction and a solver-based optimization model for real-time resource allocation. Unlike conventional scheduling techniques, LeJOT proactively predicts workload demands, dynamically allocates computing resources, and minimizes costs while ensuring performance requirements are met. Experimental results on real-world Databricks workloads demonstrate that LeJOT achieves an average 20% reduction in cloud computing costs within a minute-level scheduling timeframe, outperforming traditional static allocation strategies. Our approach provides a scalable and adaptive solution for cost-efficient job scheduling in Data Lakehouse environments.
Problem

Research questions and friction points this paper is trying to address.

Intelligent cost orchestration for Databricks job execution
Dynamic resource allocation to reduce operational cloud expenses
Machine learning-based optimization for workload scheduling efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Machine learning predicts job execution time
Solver-based model optimizes real-time resource allocation
Proactive dynamic scheduling reduces cloud computing costs
🔎 Similar Papers
No similar papers found.
Lizhi Ma
Lizhi Ma
Hangzhou Normal University
Developmental PsychologyEmotion developmentWord learningPsycholinguisticAI in Psychotherapy
Y
Yi-Xiang Hu
University of Science and Technology of China, Hefei, China
Y
Yuke Wang
University of Science and Technology of China, Hefei, China
Y
Yifang Zhao
University of Science and Technology of China, Hefei, China
Yihui Ren
Yihui Ren
Brookhaven National Laboratory
artificial intellegencephysicsnetwork sciencecomputer science
J
Jian-Xiang Liao
University of Science and Technology of China, Hefei, China
Feng Wu
Feng Wu
National University of Singapore
Mechine LearningMedical Time Series
X
Xiang-Yang Li
University of Science and Technology of China, Hefei, China