Compliance-to-Code: Enhancing Financial Compliance Checking via Code Generation

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Existing RegTech systems and large language models struggle to accurately generate executable compliance code from complex Chinese financial regulations. Method: We propose FinCheck, an end-to-end compliance verification framework featuring (i) domain-adaptive pretraining, (ii) hierarchical logical parsing, (iii) structured prompt engineering, and (iv) interpretable report generation. Contribution/Results: We introduce the first large-scale, fine-grained Chinese financial regulation code generation dataset—comprising 361 regulations and 1,159 clauses—annotated with a novel four-element schema (“subject-condition-constraint-context”), each paired with deterministic Python implementations and step-by-step reasoning explanations. Experiments demonstrate that FinCheck significantly improves code generation accuracy and logical consistency, outperforming state-of-the-art baselines across multiple compliance tasks, thereby enabling practical automated auditing and regulatory technology deployment.

Technology Category

Application Category

📝 Abstract

Nowadays, regulatory compliance has become a cornerstone of corporate governance, ensuring adherence to systematic legal frameworks. At its core, financial regulations often comprise highly intricate provisions, layered logical structures, and numerous exceptions, which inevitably result in labor-intensive or comprehension challenges. To mitigate this, recent Regulatory Technology (RegTech) and Large Language Models (LLMs) have gained significant attention in automating the conversion of regulatory text into executable compliance logic. However, their performance remains suboptimal particularly when applied to Chinese-language financial regulations, due to three key limitations: (1) incomplete domain-specific knowledge representation, (2) insufficient hierarchical reasoning capabilities, and (3) failure to maintain temporal and logical coherence. One promising solution is to develop a domain specific and code-oriented datasets for model training. Existing datasets such as LexGLUE, LegalBench, and CODE-ACCORD are often English-focused, domain-mismatched, or lack fine-grained granularity for compliance code generation. To fill these gaps, we present Compliance-to-Code, the first large-scale Chinese dataset dedicated to financial regulatory compliance. Covering 1,159 annotated clauses from 361 regulations across ten categories, each clause is modularly structured with four logical elements-subject, condition, constraint, and contextual information-along with regulation relations. We provide deterministic Python code mappings, detailed code reasoning, and code explanations to facilitate automated auditing. To demonstrate utility, we present FinCheck: a pipeline for regulation structuring, code generation, and report generation.

Problem

Research questions and friction points this paper is trying to address.

Automating Chinese financial regulation compliance with code generation

Addressing knowledge gaps in domain-specific regulatory logic conversion

Creating a structured dataset for precise compliance auditing automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Chinese dataset for financial compliance

Modular structuring with four logical elements

Python code mappings for automated auditing

🔎 Similar Papers

SC-Bench: A Large-Scale Dataset for Smart Contract Auditing