🤖 AI Summary
This study investigates the practical utility of large language models (LLMs) in deterministic network operations. We deployed an LLM-powered chatbot—integrating retrieval-augmented generation, device CLI control interfaces, and a ticketing system—in a real-world demonstration network comprising 21 racks, and engaged 105 network engineers to evaluate its performance during authentic network deployment and operational tasks. As the first work to quantitatively assess LLM assistance in a large-scale, real-world network environment, we find that 68.1% of users provided positive feedback. Our results not only establish a baseline for LLM effectiveness in network operations but also reveal that users’ understanding of the LLM’s capabilities significantly influences interaction quality, supported by detailed use cases and interaction analyses.
📝 Abstract
This paper reports on a real-world case study in which over 100 network engineers assessed how a Large Language Model (LLM) can assist in building and operating a network. The versatility of LLMs has accelerated their adoption across a wide range of domains, and assisting network operations is one such promising application. LLMs are probabilistic models, unlike deterministic protocols and configurations; therefore, clarifying their capabilities -- how and to what extent LLMs can help in network operations -- is a crucial step toward adopting LLMs. To offer practical insights into this issue, we conducted an extensive experiment on a large demonstration network built for a public exhibition, consisting of 21 racks with heterogeneous network devices. In the experiment, a total of 105 network engineers used an LLM-based chatbot while building and operating the network. The chatbot was equipped with three external functions: retrieval-augmented generation for domain-specific knowledge, CLI control of network devices running on the network, and access to a ticket system. The participants gave evaluations for the chatbot's responses on a best-effort basis. Analysis of the chat histories shows that 68.1% of the evaluations were positive, indicating a quantitative baseline of the LLM's helpfulness in network operations. Our results also demonstrate that understanding the capabilities of the chatbot is important for eliciting better responses. Moreover, we provide detailed use case analyses while sharing actual user--chatbot interactions.