Meeting SLOs, Slashing Hours: Automated Enterprise LLM Optimization with OptiKIT

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges enterprises face in deploying large language models (LLMs), where limited computational budgets and a lack of specialized optimization expertise often result in low GPU utilization and inefficient deployment. To democratize large-scale LLM optimization for non-expert teams, we propose OptiKIT, a novel framework that integrates dynamic resource scheduling, a staged optimization pipeline, automated cleanup mechanisms, and enterprise-grade system integration within a distributed architecture to automate model compression and tuning. In real-world production environments, OptiKIT substantially lowers the barrier to AI deployment, enabling application teams without deep optimization experience to reliably meet performance targets while achieving over a 2× improvement in GPU throughput.

Technology Category

Application Category

📝 Abstract
Enterprise LLM deployment faces a critical scalability challenge: organizations must optimize models systematically to scale AI initiatives within constrained compute budgets, yet the specialized expertise required for manual optimization remains a niche and scarce skillset. This challenge is particularly evident in managing GPU utilization across heterogeneous infrastructure while enabling teams with diverse workloads and limited LLM optimization experience to deploy models efficiently. We present OptiKIT, a distributed LLM optimization framework that democratizes model compression and tuning by automating complex optimization workflows for non-expert teams. OptiKIT provides dynamic resource allocation, staged pipeline execution with automatic cleanup, and seamless enterprise integration. In production, it delivers more than 2x GPU throughput improvement while empowering application teams to achieve consistent performance improvements without deep LLM optimization expertise. We share both the platform design and key engineering insights into resource allocation algorithms, pipeline orchestration, and integration patterns that enable large-scale, production-grade democratization of model optimization. Finally, we open-source the system to enable external contributions and broader reproducibility.
Problem

Research questions and friction points this paper is trying to address.

LLM optimization
enterprise deployment
GPU utilization
scalability
resource constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

automated LLM optimization
distributed optimization framework
dynamic resource allocation
model compression
enterprise AI deployment
🔎 Similar Papers
No similar papers found.
N
Nicholas Santavas
eBay, Foundation Models Team, Amsterdam, the Netherlands
Kareem Eissa
Kareem Eissa
Nile University
Artificial IntelligenceMachine LearningNatural Language Processing
P
Patrycja Cieplicka
eBay, Foundation Models Team, Amsterdam, the Netherlands
P
Piotr Florek
eBay, Foundation Models Team, Amsterdam, the Netherlands
Matteo Nulli
Matteo Nulli
eBay, University of Amsterdam
Multimodal LearningVision Language ModelsDeep Learning
S
Stefan Vasilev
eBay, Foundation Models Team, Amsterdam, the Netherlands
Seyyed Hadi Hashemi
Seyyed Hadi Hashemi
eBay
Information RetrievalNatural Language ProcessingLarge Language Models
A
Antonios Gasteratos
Democritus University of Thrace, Xanthi, Greece
Shahram Khadivi
Shahram Khadivi
eBay Inc.
Natural Language Processing and Machine Learning