WiLLM: An Open Wireless LLM Communication System

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing wireless networks struggle to support mobile large language model (LLM) inference due to stringent latency, bandwidth, and energy constraints. Method: This paper introduces the first open-source wireless communication system tailored for mobile LLM inference. Its core innovation is a “tree-branch-fruit” three-tier network slicing architecture enabling multi-user, multi-slice coordinated scheduling and cross-layer joint optimization of telecom resources and AI computation. The system integrates an application-layer tunnel (backward-compatible with legacy devices), a dual-mode scheduler, and cross-layer APIs for flexible deployment, built upon OpenAirInterface with integrated slice orchestration and edge-distributed LLM inference. Contribution/Results: We release the first LLM-oriented wireless dataset—comprising 1.64 million samples and 58-dimensional metrics—alongside two benchmarks, validating feasibility on resource-constrained devices such as smart glasses. All code, hardware designs, and datasets are fully open-sourced.

Technology Category

Application Category

📝 Abstract
The rapid evolution of LLMs threatens to overwhelm existing wireless infrastructure, necessitating architectural innovations for burgeoning mobile LLM services. This paper introduces WiLLM, the first open-source wireless system specifically designed for these services. First, we establish a new paradigm by deploying LLMs in core networks (CNs) with abundant GPUs. This enables distributed inference services, strategically positioning LLM inference at the convergence of backbone bandwidth and the cellular network's edge. Second, we propose an innovative "Tree-Branch-Fruit" extension to the conventional network slicing architecture. This specialized design allows telecom operators to monetize LLM services through slice subscriptions while maintaining infrastructure ownership. Finally, to realize this vision, WiLLM addresses critical limitations in current solutions with several novel capabilities. It features enhanced slice orchestration through a dual-layer slicing architecture, enabling coordinated multi-UE-multi-slice scheduling for finer-grained resource allocation. To ensure universal compatibility, an application-layer tunneling mechanism allows legacy devices without native slicing to access LLM slice services without hardware upgrades. Furthermore, its dual-mode scheduling and cross-layer APIs support flexible deployment from CNs to servers. Built on OpenAirInterface, WiLLM extends this established framework, lowering the adoption barrier for researchers. We also release the first LLM wireless communication dataset with 1,649,996 records and synchronized 58-dimensional metrics, alongside two benchmarks. A case study with smart glasses demonstrates practical viability for resource-constrained devices. WiLLM aims to foster an open platform for cross-layer optimization and AI-telecom convergence. The code, datasets, and hardware details are available at https://openwillm.github.io.
Problem

Research questions and friction points this paper is trying to address.

Optimize wireless infrastructure for mobile LLM services
Enable distributed LLM inference at network edges
Monetize LLM services via innovative network slicing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deploys LLMs in core networks with GPUs
Introduces Tree-Branch-Fruit network slicing
Uses dual-layer slicing for resource allocation
🔎 Similar Papers
No similar papers found.