Bounded PCTL Model Checking of Large Language Model Outputs

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the challenge of formally verifying the probabilistic behavior of large language models (LLMs) during text generation. We propose LLMCHECKER—the first framework to integrate probabilistic computation tree logic (PCTL) model checking into LLM output verification. Its core innovation is the α-k bounded generation mechanism: at each decoding step, only the top-k tokens whose cumulative probability mass is at least α are retained, thereby imposing probabilistic constraints on generation paths while enabling scalable verification. The framework unifies PCTL property specification, top-k sampling, cumulative probability pruning, and textual quantitative analysis, and supports mainstream models including Llama, Gemma, and Mistral. Experiments demonstrate that LLMCHECKER efficiently and interpretable verifies critical behavioral properties—such as generation consistency, quality stability, and bias propensity—thereby significantly enhancing the formal assurance of LLM behavior.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce LLMCHECKER, a model-checking-based verification method to verify the probabilistic computation tree logic (PCTL) properties of an LLM text generation process. We empirically show that only a limited number of tokens are typically chosen during text generation, which are not always the same. This insight drives the creation of $α$-$k$-bounded text generation, narrowing the focus to the $α$ maximal cumulative probability on the top-$k$ tokens at every step of the text generation process. Our verification method considers an initial string and the subsequent top-$k$ tokens while accommodating diverse text quantification methods, such as evaluating text quality and biases. The threshold $α$ further reduces the selected tokens, only choosing those that exceed or meet it in cumulative probability. LLMCHECKER then allows us to formally verify the PCTL properties of $α$-$k$-bounded LLMs. We demonstrate the applicability of our method in several LLMs, including Llama, Gemma, Mistral, Genstruct, and BERT. To our knowledge, this is the first time PCTL-based model checking has been used to check the consistency of the LLM text generation process.

Problem

Research questions and friction points this paper is trying to address.

Verifying probabilistic computation tree logic properties of LLM text generation

Addressing limited token selection variability during text generation process

Formally checking consistency of bounded LLM outputs using model checking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses α-k-bounded text generation for verification

Verifies PCTL properties on top-k tokens

Applies model checking to LLM outputs

🔎 Similar Papers

No similar papers found.