🤖 AI Summary
Traditional extreme-case testing for network software relies heavily on manual boundary-value analysis, resulting in low efficiency and insufficient coverage. Method: This paper proposes the first large language model (LLM)-based automated extreme-case testing framework. It leverages LLMs to automatically infer input constraints of protocols and algorithms, generate constraint-violating extreme test cases, and synthesize filtering code to detect anomalous behaviors. Contribution/Results: The approach advances extreme-case testing from syntactic boundary analysis to a semantic-driven paradigm centered on constraint discovery and violation. It supports diverse distributed protocols (e.g., HTTP, BGP, DNS) and centralized network algorithms (e.g., Dijkstra). Evaluated on 12 real-world open-source implementations, the framework uncovered 17 previously unknown vulnerabilities—including 5 assigned CVE identifiers—demonstrating its effectiveness, scalability, and practical utility.
📝 Abstract
Physicists often manually consider extreme cases when testing a theory. In this paper, we show how to automate extremal testing of network software using LLMs in two steps: first, ask the LLM to generate input constraints (e.g., DNS name length limits); then ask the LLM to generate tests that violate the constraints. We demonstrate how easy this process is by generating extremal tests for HTTP, BGP and DNS implementations, each of which uncovered new bugs. We show how this methodology extends to centralized network software such as shortest path algorithms, and how LLMs can generate filtering code to reject extremal input. We propose using agentic AI to further automate extremal testing. LLM-generated extremal testing goes beyond an old technique in software testing called Boundary Value Analysis.