ZipLex: Verified Invertible Lexing with Memoized Derivatives and Zippers

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

Existing lexical analyzers lack formal guarantees that tokenization and pretty-printing are exact inverses. Method: ZipLex introduces the first verifiable bidirectional lexical analysis framework, leveraging a novel token-sequence abstraction that integrates Huet’s zipper data structure with memoized derivatives to enable efficient, formally verified bidirectional transformation. Implemented in Scala and fully verified using Stainless, the framework supports practical applications including JSON parsing and general-purpose programming language lexing. Contribution/Results: Experiments demonstrate that ZipLex achieves two orders of magnitude higher performance than Verbatim++, while maintaining strict inverse correctness; although it is four times slower than the unverified Coqlex, ZipLex is the first system to empirically establish the feasibility and practicality of high-assurance bidirectional lexical analysis.

Technology Category

Application Category

📝 Abstract

We present ZipLex, a verified framework for invertible lexical analysis. Unlike past verified lexers that focus only on satisfying the semantics of regular expressions and the maximal munch property, ZipLex also guarantees that lexing and printing are mutual inverses. Our design relies on two sets of ideas: (1) a new abstraction of token sequences that captures the separability of tokens in a sequence while supporting their efficient manipulation, and (2) a combination of verified data structures and optimizations, including Huet's zippers and memoized derivatives, to achieve practical performance. We implemented ZipLex in Scala and verified its correctness, including invertibility, using the Stainless verifier. Our evaluation demonstrates that ZipLex supports realistic applications such as JSON processing and lexers of programming languages. In comparison to other verified lexers (which do not enforce invertibility), ZipLex is 4x slower than Coqlex and two orders of magnitude faster than Verbatim++, showing that verified invertibility can be achieved without prohibitive cost.

Problem

Research questions and friction points this paper is trying to address.

Ensuring lexing and printing are mutual inverses

Achieving practical performance with verified invertibility

Supporting realistic applications like JSON processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses zippers for efficient token sequence manipulation

Applies memoized derivatives for practical performance optimization

Implements verified invertible lexing with mutual inverse guarantees

🔎 Similar Papers

No similar papers found.