🤖 AI Summary
This work addresses the inefficiency and high cost of compliance testing in highly regulated domains, where current practices rely on manual translation of regulations into test cases by experts. While large language models (LLMs) offer automation potential, they often suffer from hallucination, and existing hybrid approaches still require significant human modeling effort. To overcome these limitations, the authors propose RAFT, a novel framework that explicitly extracts implicit regulatory knowledge from multiple LLMs and leverages an adaptive purification-aggregation strategy with dynamic prompt injection to automatically generate domain-specific meta-models, formalized requirements, and testability constraints—enabling fully automated, human-intervention-free compliance test generation. Experiments in financial, automotive, and power sectors demonstrate that RAFT achieves expert-level performance, significantly outperforming state-of-the-art methods while drastically reducing test case generation and review time.
📝 Abstract
Compliance testing in highly regulated domains is crucial but largely manual, requiring domain experts to translate complex regulations into executable test cases. While large language models (LLMs) show promise for automation, their susceptibility to hallucinations limits reliable application. Existing hybrid approaches mitigate this issue by constraining LLMs with formal models, but still rely on costly manual modeling. To solve this problem, this paper proposes RAFT, a framework for requirements auto-formalization and compliance test generation via explicating tacit regulatory knowledge from multiple LLMs. RAFT employs an Adaptive Purification-Aggregation strategy to explicate tacit regulatory knowledge from multiple LLMs and integrate it into three artifacts: a domain meta-model, a formal requirements representation, and testability constraints. These artifacts are then dynamically injected into prompts to guide high-precision requirement formalization and automated test generation. Experiments across financial, automotive, and power domains show that RAFT achieves expert-level performance, substantially outperforms state-of-the-art (SOTA) methods while reducing overall generation and review time.