🤖 AI Summary
Existing lexical analyzers lack formal guarantees that tokenization and pretty-printing are exact inverses. Method: ZipLex introduces the first verifiable bidirectional lexical analysis framework, leveraging a novel token-sequence abstraction that integrates Huet’s zipper data structure with memoized derivatives to enable efficient, formally verified bidirectional transformation. Implemented in Scala and fully verified using Stainless, the framework supports practical applications including JSON parsing and general-purpose programming language lexing. Contribution/Results: Experiments demonstrate that ZipLex achieves two orders of magnitude higher performance than Verbatim++, while maintaining strict inverse correctness; although it is four times slower than the unverified Coqlex, ZipLex is the first system to empirically establish the feasibility and practicality of high-assurance bidirectional lexical analysis.
📝 Abstract
We present ZipLex, a verified framework for invertible lexical analysis. Unlike past verified lexers that focus only on satisfying the semantics of regular expressions and the maximal munch property, ZipLex also guarantees that lexing and printing are mutual inverses. Our design relies on two sets of ideas: (1) a new abstraction of token sequences that captures the separability of tokens in a sequence while supporting their efficient manipulation, and (2) a combination of verified data structures and optimizations, including Huet's zippers and memoized derivatives, to achieve practical performance. We implemented ZipLex in Scala and verified its correctness, including invertibility, using the Stainless verifier. Our evaluation demonstrates that ZipLex supports realistic applications such as JSON processing and lexers of programming languages. In comparison to other verified lexers (which do not enforce invertibility), ZipLex is 4x slower than Coqlex and two orders of magnitude faster than Verbatim++, showing that verified invertibility can be achieved without prohibitive cost.