FLAT: Formal Languages as Types

📅 2025-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generic string types lead to type confusion and security vulnerabilities, making it difficult to detect semantic errors—such as malformed paths, URLs, or email addresses—statically or dynamically. Method: We propose a novel paradigm—“formal languages as types”—where context-free grammars define syntactic structure, and semantic constraints (e.g., pre- and post-conditions) refine string subtypes; we implement language-aware runtime type checking and contract-driven, grammar-guided fuzz testing. Contribution: We introduce the first string type system that jointly models syntax and semantics. It enables low-overhead, high-precision automatic violation detection with minimal user annotations. Evaluated on real-world Python code, our approach effectively identifies logical errors in string manipulations and prevents erroneous propagation, thereby significantly enhancing the safety and reliability of string-intensive operations.

Technology Category

Application Category

📝 Abstract
Programmers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses, that are conceptually different. However, existing mainstream programming languages use a unified string type to represent them all. As a result, their type systems will keep quiet when a function requiring an email address is instead fed an HTML text, which may cause unexceptional failures or vulnerabilities. To let the type system distinguish such conceptually different string types, in this paper, we propose to regard emph{formal languages as types} (FLAT), thereby restricting the set of valid strings by context-free grammars and semantic constraints if needed. To this end, email addresses and HTML text are treated as different types. We realize this idea in Python as a testing framework FLAT-PY. It contains user annotations, all directly attached to the user's code, to (1) define such emph{language types}, (2) specify pre-/post-conditions serving as emph{semantic oracles} or contracts for functions, and (3) fuzz functions via random string inputs generated from a emph{language-based fuzzer}. From these annotations, FLAY-PY emph{automatically} checks type correctness at runtime via emph{code instrumentation}, and reports any detected type error as soon as possible, preventing bugs from flowing deeply into other parts of the code. Case studies on real Python code fragments show that FLAT-PY is enable to catch logical bugs from random inputs, requiring a reasonable amount of user annotations.
Problem

Research questions and friction points this paper is trying to address.

Type Safety
String Handling
Programming Languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Formal Language Typing
Automated Error Detection
Random String Generation
🔎 Similar Papers
No similar papers found.