Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation

📅 2026-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the bottleneck in translating natural language to Linear Temporal Logic (LTL), which hinders LTL’s application in security and privacy analysis. The authors systematically evaluate the ability of prominent large language models to translate English assertions into propositional LTL, constructing a benchmark that assesses both syntactic correctness and semantic accuracy by combining human-annotated and synthetically generated data. Their findings reveal that current models generally exhibit stronger syntactic performance than semantic understanding. However, reframing the translation task as Python code completion—augmented with carefully engineered prompts—significantly improves translation accuracy. This work highlights the limitations of existing models in formal semantic mapping and proposes an effective paradigm for enhancing their performance in this critical domain.
📝 Abstract
Propositional Linear Temporal Logic (LTL) is a popular formalism for specifying desirable requirements and security and privacy policies for software, networks, and systems. Yet expressing such requirements and policies in LTL remains challenging because of its intricate semantics. Since many security and privacy analysis tools require LTL formulas as input, this difficulty places them out of reach for many developers and analysts. Large Language Models (LLMs) could broaden access to such tools by translating natural language fragments into LTL formulas. This paper evaluates that premise by assessing how effectively several representative LLMs translate assertive English sentences into LTL formulas. Using both human-generated and synthetic ground-truth data, we evaluate effectiveness along syntactic and semantic dimensions. The results reveal three findings: (1) in line with prior findings, LLMs perform better on syntactic aspects of LTL than on semantic ones; (2) they generally benefit from more detailed prompts; and (3) reformulating the task as a Python code-completion problem substantially improves overall performance. We also discuss challenges in conducting a fair evaluation on this task and conclude with recommendations for future work.
Problem

Research questions and friction points this paper is trying to address.

LTL translation
Large Language Models
natural language to formal specification
temporal logic semantics
formal methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

LTL translation
Large Language Models
semantic evaluation
code-completion formulation
formal specification
🔎 Similar Papers
No similar papers found.