🤖 AI Summary
This work addresses the challenge of enforcing temporal safety constraints throughout the lifecycle of black-box AI systems, such as large language models (LLMs), which are inherently difficult to verify. The paper proposes the first offline auditing and online monitoring framework that integrates Linear Temporal Logic (LTL) with machine learning. This framework enables formal verification of complex temporal behavioral specifications by introducing a sampling-driven predictive monitor and an intervenable runtime monitor, effectively preventing policy violations. Experimental results demonstrate that the proposed approach significantly outperforms existing LLM-based evaluators in detecting temporal violations. Notably, it achieves performance on par with or superior to state-of-the-art large models using only a small labeled model, while its intervention mechanism substantially reduces violation rates without compromising task performance.
📝 Abstract
We examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques that enable AI-enabled product and service developers, as well as third party AI developers and evaluators, to perform offline auditing and online (runtime) monitoring of product-specific (temporally extended) behavioral constraints such as safety constraints, norms, rules and regulations with respect to black-box advanced AI systems, notably LLMs. We further provide practical techniques for predictive monitoring, such as sampling-based methods, and we introduce intervening monitors that act at runtime to preempt and potentially mitigate predicted violations. Experimental results show that by exploiting the formal syntax and semantics of Linear Temporal Logic (LTL), our proposed auditing and monitoring techniques are superior to LLM baseline methods in detecting violations of temporally extended behavioral constraints; with our approach, even small-model labelers match or exceed frontier LLM judges. Our predictive and intervening monitors significantly reduce the violation rates of LLM-based agents while largely preserving task performance. We further show through controlled experiments that LLMs'temporal reasoning shows a pronounced degradation in accuracy with increasing event distance, number of constraints, and number of propositions.