A Network Arena for Benchmarking AI Agents on Network Troubleshooting

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based agents lack a low-cost, standardized benchmark for dynamic network fault diagnosis. This paper introduces NIKA—the first open-source, large-scale network fault diagnosis benchmark—enabling zero-cost replay of real-world network failure scenarios and rapid agent prototyping. Its contributions are threefold: (1) a modular, extensible agent-network interface; (2) comprehensive coverage of five network topologies (e.g., data centers, ISP networks) and 54 representative fault types; and (3) a high-fidelity simulation environment built from real topologies and operational logs, with standardized APIs and seamless integration into mainstream LLM evaluation frameworks. Experiments reveal that while LLMs excel at fault detection, their root-cause localization capability remains significantly limited. NIKA is publicly available on GitHub and is emerging as a community-standard evaluation platform.

Technology Category

Application Category

📝 Abstract
Agentic systems, powered by Large Language Models (LLMs), assist network engineers with network configuration synthesis and network troubleshooting tasks. For network troubleshooting, progress is hindered by the absence of standardized and accessible benchmarks for evaluating LLM agents in dynamic network settings at low operational effort. We present NIKA, the largest public benchmark to date for LLM-driven network incident diagnosis and troubleshooting. NIKA targets both domain experts and especially AI researchers alike, providing zero-effort replay of real-world network scenarios, and establishing well-defined agent-network interfaces for quick agent prototyping. NIKA comprises hundreds of curated network incidents, spanning five network scenarios, from data centers to ISP networks, and covers 54 representative network issues. Lastly, NIKA is modular and extensible by design, offering APIs to facilitate the integration of new network scenarios and failure cases. We evaluate state-of-the-art LLM agents on NIKA and find that while larger models succeed more often in detecting network issues, they still struggle to localize faults and identify root causes. NIKA is open-source and available to the community: https://github.com/sands-lab/nika.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized benchmarks for LLM agents in network troubleshooting
Need for accessible tools to evaluate AI agents in dynamic network environments
Difficulty in localizing faults and identifying root causes with current LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest public benchmark for LLM-driven network troubleshooting
Zero-effort replay of real-world network scenarios
Modular design with APIs for extensibility
🔎 Similar Papers
No similar papers found.
Zhihao Wang
Zhihao Wang
Peking University
RoboticsReinforcement Learning
A
Alessandro Cornacchia
KAUST, Saudi Arabia
A
Alessio Sacco
Politecnico di Torino, Italy
F
Franco Galante
Politecnico di Torino, Italy
Marco Canini
Marco Canini
Professor of Computer Science, KAUST
SystemsNetworkingDistributed SystemsMachine Learning
D
Dingde Jiang
University of Electronic Science and Technology of China (UESTC), China