🤖 AI Summary
Traditional file systems face significant challenges in efficiently adapting to emerging hardware and application requirements, suffering from high development overhead, evolutionary fragility, and ambiguous specification. This paper proposes a generative file system paradigm centered on large language models (LLMs), which automatically constructs and evolves file systems from formal, multi-dimensional specifications. Our key contributions are: (1) SYSSPEC, the first multi-part formal specification framework for file systems; (2) DAG-structured specification patches enabling lossless, auditable, versioned evolution; and (3) a hallucination-resistant LLM agent toolkit with integrated generation, verification, and repair capabilities. Evaluation shows that SPECFS—a concurrently safe, LLM-generated file system—passes hundreds of regression tests, matches hand-crafted implementations in performance, and seamlessly integrates ten real-world Ext4 features, demonstrating both robust evolutionary capability and practical engineering viability.
📝 Abstract
File systems are critical OS components that require constant evolution to support new hardware and emerging applica- tion needs. However, the traditional paradigm of developing features, fixing bugs, and maintaining the system incurs significant overhead, especially as systems grow in complexity. This paper proposes a new paradigm, generative file systems, which leverages Large Language Models (LLMs) to generate and evolve a file system from prompts, effectively addressing the need for robust evolution. Despite the widespread success of LLMs in code generation, attempts to create a functional file system have thus far been unsuccessful, mainly due to the ambiguity of natural language prompts.
This paper introduces SYSSPEC, a framework for developing generative file systems. Its key insight is to replace ambiguous natural language with principles adapted from formal methods. Instead of imprecise prompts, SYSSPEC employs a multi-part specification that accurately describes a file system's functionality, modularity, and concurrency. The specification acts as an unambiguous blueprint, guiding LLMs to generate expected code flexibly. To manage evolution, we develop a DAG-structured patch that operates on the specification itself, enabling new features to be added without violating existing invariants. Moreover, the SYSSPEC toolchain features a set of LLM-based agents with mechanisms to mitigate hallucination during construction and evolution. We demonstrate our approach by generating SPECFS, a concurrent file system. SPECFS passes hundreds of regression tests, matching a manually-coded baseline. We further confirm its evolvability by seamlessly integrating 10 real-world features from Ext4. Our work shows that a specification-guided approach makes generating and evolving complex systems not only feasible but also highly effective.