An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face significant challenges in efficient training and inference on single-node, especially consumer-grade, hardware due to their massive parameter counts. Method: This paper proposes a lightweight deployment framework integrating model partitioning, distributed scheduling, and metaheuristic load balancing. Unlike existing LLM serving systems, it is resource-aware, dynamically optimizing computational graph partitioning and inter-device communication overhead, while embedding an enhanced ant colony optimization algorithm for adaptive task allocation across heterogeneous hardware. Contribution/Results: On a single consumer PC equipped with an RTX 4090 GPU, the framework successfully deploys 7B–13B LLMs. It achieves 18.7%, 23.4%, and 31.2% higher inference throughput compared to vLLM, Text Generation Inference (TGI), and Hugging Face’s Text Generation Inference, respectively, while reducing peak GPU memory consumption by 32%. The approach provides a reproducible technical pathway for democratizing LLM deployment at the edge.

Technology Category

Application Category

📝 Abstract
Large language models (LLM) are advanced AI systems trained on extensive textual data, leveraging deep learning techniques to understand and generate human-like language. Today's LLMs with billions of parameters are so huge that hardly any single computing node can train, fine-tune, or infer from them. Therefore, several distributed computing techniques are being introduced in the literature to properly utilize LLMs. We have explored the application of distributed computing techniques in LLMs from two angles. egin{itemize} item We study the techniques that democratize the LLM, that is, how large models can be run on consumer-grade computers. Here, we also implement a novel metaheuristics-based modification to an existing system. item We perform a comparative study on three state-of-the-art LLM serving techniques. end{itemize}
Problem

Research questions and friction points this paper is trying to address.

Exploring distributed computing for large model training
Democratizing LLMs through consumer-grade hardware adaptation
Comparing state-of-the-art LLM serving techniques performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed computing enables LLM training on consumer hardware
Metaheuristics-based modification improves existing distributed systems
Comparative analysis of three state-of-the-art LLM serving techniques
🔎 Similar Papers
No similar papers found.
Sheikh Azizul Hakim
Sheikh Azizul Hakim
Lecturer and Graduate Student at CSE, BUET
Graph TheoryBioinformatics
S
Saem Hasan
Computer Science and Engineering, Bangladesh University of Engineering and Technology