🤖 AI Summary
In highly regulated domains such as finance and healthcare, the inherent stochasticity of large language models (LLMs) impedes their compliant deployment, necessitating formalization of natural language policies and rigorous verification of logical correctness. This paper proposes a two-stage neuro-symbolic framework: first, LLM-driven *runtime automatic formalization*, augmented by human-in-the-loop guidance; second, cross-verification via multiple independent formalizations and semantic equivalence checking to ensure logical consistency. The approach substantially reduces false positives and generates auditable, traceable logical evidence chains. Evaluated on benchmark policy datasets, it achieves over 99% reliability—marking the first demonstration of high-accuracy, traceable, and formally verifiable automated compliance assessment for natural language policies. This work establishes a trustworthy AI pathway for high-stakes operational environments.
📝 Abstract
Large Language Models perform well at natural language interpretation and reasoning, but their inherent stochasticity limits their adoption in regulated industries like finance and healthcare that operate under strict policies. To address this limitation, we present a two-stage neurosymbolic framework that (1) uses LLMs with optional human guidance to formalize natural language policies, allowing fine-grained control of the formalization process, and (2) uses inference-time autoformalization to validate logical correctness of natural language statements against those policies. When correctness is paramount, we perform multiple redundant formalization steps at inference time, cross checking the formalizations for semantic equivalence. Our benchmarks demonstrate that our approach exceeds 99% soundness, indicating a near-zero false positive rate in identifying logical validity. Our approach produces auditable logical artifacts that substantiate the verification outcomes and can be used to improve the original text.