An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Large language models (LLMs) face significant challenges in efficient training and inference on single-node, especially consumer-grade, hardware due to their massive parameter counts. Method: This paper proposes a lightweight deployment framework integrating model partitioning, distributed scheduling, and metaheuristic load balancing. Unlike existing LLM serving systems, it is resource-aware, dynamically optimizing computational graph partitioning and inter-device communication overhead, while embedding an enhanced ant colony optimization algorithm for adaptive task allocation across heterogeneous hardware. Contribution/Results: On a single consumer PC equipped with an RTX 4090 GPU, the framework successfully deploys 7B–13B LLMs. It achieves 18.7%, 23.4%, and 31.2% higher inference throughput compared to vLLM, Text Generation Inference (TGI), and Hugging Face’s Text Generation Inference, respectively, while reducing peak GPU memory consumption by 32%. The approach provides a reproducible technical pathway for democratizing LLM deployment at the edge.

Technology Category

Application Category

📝 Abstract

Large language models (LLM) are advanced AI systems trained on extensive textual data, leveraging deep learning techniques to understand and generate human-like language. Today's LLMs with billions of parameters are so huge that hardly any single computing node can train, fine-tune, or infer from them. Therefore, several distributed computing techniques are being introduced in the literature to properly utilize LLMs. We have explored the application of distributed computing techniques in LLMs from two angles. egin{itemize} item We study the techniques that democratize the LLM, that is, how large models can be run on consumer-grade computers. Here, we also implement a novel metaheuristics-based modification to an existing system. item We perform a comparative study on three state-of-the-art LLM serving techniques. end{itemize}

Problem

Research questions and friction points this paper is trying to address.

Exploring distributed computing for large model training

Democratizing LLMs through consumer-grade hardware adaptation

Comparing state-of-the-art LLM serving techniques performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed computing enables LLM training on consumer hardware

Metaheuristics-based modification improves existing distributed systems

Comparative analysis of three state-of-the-art LLM serving techniques

🔎 Similar Papers

No similar papers found.

ByteDance

United States / China / Singapore

Large Model Training Acceleration Engineer

ByteDance

圣何塞

Authors to Follow