🤖 AI Summary
Existing RegTech systems and large language models struggle to accurately generate executable compliance code from complex Chinese financial regulations. Method: We propose FinCheck, an end-to-end compliance verification framework featuring (i) domain-adaptive pretraining, (ii) hierarchical logical parsing, (iii) structured prompt engineering, and (iv) interpretable report generation. Contribution/Results: We introduce the first large-scale, fine-grained Chinese financial regulation code generation dataset—comprising 361 regulations and 1,159 clauses—annotated with a novel four-element schema (“subject-condition-constraint-context”), each paired with deterministic Python implementations and step-by-step reasoning explanations. Experiments demonstrate that FinCheck significantly improves code generation accuracy and logical consistency, outperforming state-of-the-art baselines across multiple compliance tasks, thereby enabling practical automated auditing and regulatory technology deployment.
📝 Abstract
Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.