🤖 AI Summary
Automating crash fixes in the Linux kernel—comprising 50K files and 20M lines of code—remains a formidable challenge for large language models (LLMs).
Method: This paper introduces the first LLM agent framework tailored for kernel-level crash repair. It implements a closed-loop debugging workflow inspired by kernel developers’ practices, featuring a novel “hypothesize-then-repair” reasoning strategy. The framework integrates dynamic context-aware retrieval, automated crash log parsing and reproduction, system-level simulation via kGymSuite, and Code-LLM-driven hypothesis validation and patch generation.
Contributions/Results: (1) The first application of LLM agents to real-world kernel crash repair; (2) Open-sourcing of kGymSuite—a scalable, interactive simulation platform that overcomes environmental interaction and scalability bottlenecks for LLM agents in large-scale systems; (3) Generation of multiple plausible patches for previously unfixed real bugs, with at least two assessed as practically feasible—substantially improving both interpretability and success rate of automated repairs.
📝 Abstract
Code large language models (LLMs) have shown impressive capabilities on a multitude of software engineering tasks. In particular, they have demonstrated remarkable utility in the task of code repair. However, common benchmarks used to evaluate the performance of code LLMs are often limited to small-scale settings. In this work, we build upon kGym, which shares a benchmark for system-level Linux kernel bugs and a platform to run experiments on the Linux kernel. This paper introduces CrashFixer, the first LLM-based software repair agent that is applicable to Linux kernel bugs. Inspired by the typical workflow of a kernel developer, we identify the key capabilities an expert developer leverages to resolve a kernel crash. Using this as our guide, we revisit the kGym platform and identify key system improvements needed to practically run LLM-based agents at the scale of the Linux kernel (50K files and 20M lines of code). We implement these changes by extending kGym to create an improved platform - called kGymSuite, which will be open-sourced. Finally, the paper presents an evaluation of various repair strategies for such complex kernel bugs and showcases the value of explicitly generating a hypothesis before attempting to fix bugs in complex systems such as the Linux kernel. We also evaluated CrashFixer's capabilities on still open bugs, and found at least two patch suggestions considered plausible to resolve the reported bug.