Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

📅 2026-04-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the high computational overhead and scalability challenges inherent in training and inference of large language models (LLMs) by proposing the first cloud-native distributed systems framework specifically designed for LLMs. The framework integrates key technologies including microservices, auto-scaling, cloud-edge collaboration, serverless inference, and federated learning, while also exploring a forward-looking pathway toward quantum computing integration. By systematically optimizing data management and resource scheduling, the proposed architecture significantly enhances deployment efficiency and elasticity. It provides robust technical foundations for the efficient operation and continuous evolution of LLMs, thereby fostering synergies across industry, academia, and research communities and advancing standardization efforts in the field.

Technology Category

Application Category

📝 Abstract
The rapid rise of Large Language Models (LLMs) has revolutionized various artificial intelligence (AI) applications, from natural language processing to code generation. However, the computational demands of these models, particularly in training and inference, present significant challenges. Traditional systems are often unable to meet these requirements, necessitating the integration of cloud-native and distributed architectures. This paper explores the role of cloud platforms and distributed systems in supporting the scalability, efficiency, and optimization of LLMs. We discuss the complexities of LLM deployment, including data management, resource optimization, and the need for microservices, autoscaling, and hybrid cloud-edge solutions. Additionally, we examine emerging research trends, such as serverless inference, quantum computing, and federated learning, and their potential to drive the next phase of LLM innovation. The paper concludes with a roadmap for future developments, emphasizing the need for continued research, standardization, and cross-sector collaboration to sustain the growth of LLMs in both research and enterprise applications.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Cloud-native
Distributed Systems
Scalability
Computational Demands
Innovation

Methods, ideas, or system contributions that make the work stand out.

cloud-native
distributed systems
serverless inference
federated learning
autoscaling
🔎 Similar Papers
No similar papers found.
Minxian Xu
Minxian Xu
Associate Professor, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Cloud ComputingMicroservicesLLM Inference
Jingfeng Wu
Jingfeng Wu
University of California, Berkeley
deep learning theorymachine learningoptimizationstatistical learning theory
S
Shengye Song
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Satish Narayana Srirama
Satish Narayana Srirama
School of Computer and Information Sciences, University of Hyderabad, India.
Edge/Cloud ComputingIoT/Edge AnalyticsMobile Web ServicesMobile Cloud#unitartucs
Bahman Javadi
Bahman Javadi
Full Professor, Western Sydney University
Distributed ComputingEdge ComputingReliabilityInternet of ThingsSmart Computing
Rajiv Ranjan
Rajiv Ranjan
PhD. Scholar, Plaksha University
Foundation ModelsSelf Supervised LearningComputer VisionRemote SensingGIS
Devki Nandan Jha
Devki Nandan Jha
Newcastle University
Cloud ComputingContainersInternet of ThingsTrusted Computing
Sa Wang
Sa Wang
Associate Professor, Institute of Computing Technology, CAS
Cloud ComputingOperating Systems
Wenhong Tian
Wenhong Tian
University of Electronic Science and Technology of China
Approximation Algorithms for NP-Hard ProblemsResource SchedulingNetwork Modeling and Performance Optimization
H
Huanle Xu
University of Macau, Macau SAR, China
Li Li
Li Li
University of Macau
Mobile ComputingCloud Computing
Z
Zizhao Mo
University of Macau, Macau SAR, China
S
Shuo Ren
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Thomas Kunz
Thomas Kunz
Professor of Systems and Computer Engineering, Carleton University
WSNMANETsIoTSDNNFV
P
Petar Kochovski
University of Ljubljana, Ljubljana, Slovenia
Vlado Stankovski
Vlado Stankovski
Full Professor, University of Ljubljana
Software EngineeringDistributed ComputingArtificial IntelligenceSemantics
Kejiang Ye
Kejiang Ye
Professor, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences
Cloud ComputingAI SystemsIndustrial Internet
C
Chengzhong Xu
University of Macau, Macau SAR, China
Rajkumar Buyya
Rajkumar Buyya
School of Computing and Information Systems, The Uni of Melbourne; Fellow of IEEE & Academia Europea
Cloud ComputingData CentersEdge ComputingInternet of ThingsQuantum Computing