CNFinBench: A Benchmark for Safety and Compliance of Large Language Models in Finance

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing financial large language model (LLM) safety benchmarks emphasize textbook-style QA, failing to capture real-world regulatory compliance, investor protection, multi-turn adversarial risks (e.g., jailbreaking, prompt injection), long-document grounding, and over-reliance risks from RAG/tool use—while lacking transparency. Method: We introduce FinSafeBench, the first domain-specific benchmark for financial LLM safety and compliance, proposing a tripartite “Capability–Compliance–Safety” evaluation framework; designing the quantifiable, multi-turn robust Harmful Instruction Compliance Score (HICS); and incorporating dynamic option perturbation with LLM-human collaborative adjudication to ensure auditable, traceable evaluation. Results: Evaluated across 21 models and 15 subtasks, we observe a significant capability–compliance gap (61.0 vs. 34.18), with HICS scores consistently low (60–79), empirically demonstrating that refusal behavior does not guarantee safety.

Technology Category

Application Category

📝 Abstract
Large language models are increasingly deployed across the financial sector for tasks such as research, compliance, risk analysis, and customer service, which makes rigorous safety evaluation essential. However, existing financial benchmarks primarily focus on textbook-style question answering and numerical problem solving, but fail to evaluate models' real-world safety behaviors. They weakly assess regulatory compliance and investor-protection norms, rarely stress-test multi-turn adversarial tactics such as jailbreaks or prompt injection, inconsistently ground answers in long filings, ignore tool- or RAG-induced over-reach risks, and rely on opaque or non-auditable evaluation protocols. To close these gaps, we introduce CNFinBench, a benchmark that employs finance-tailored red-team dialogues and is structured around a Capability-Compliance-Safety triad, including evidence-grounded reasoning over long reports and jurisdiction-aware rule/tax compliance tasks. For systematic safety quantification, we introduce the Harmful Instruction Compliance Score (HICS) to measure how consistently models resist harmful prompts across multi-turn adversarial dialogues. To ensure auditability, CNFinBench enforces strict output formats with dynamic option perturbation for objective tasks and employs a hybrid LLM-ensemble plus human-calibrated judge for open-ended evaluations. Experiments on 21 models across 15 subtasks confirm a persistent capability-compliance gap: models achieve an average score of 61.0 on capability tasks but fall to 34.18 on compliance and risk-control evaluations. Under multi-turn adversarial dialogue tests, most systems reach only partial resistance (HICS 60-79), demonstrating that refusal alone is not a reliable proxy for safety without cited and verifiable reasoning.
Problem

Research questions and friction points this paper is trying to address.

Evaluates financial LLM safety beyond textbook tasks
Measures compliance with regulations and investor protection norms
Tests multi-turn adversarial tactics and verifiable reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Finance-tailored red-team dialogues for safety evaluation
Harmful Instruction Compliance Score for multi-turn adversarial resistance
Hybrid LLM-ensemble plus human-calibrated judge for auditability
🔎 Similar Papers
No similar papers found.
J
Jinru Ding
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
C
Chao Ding
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
W
Wenrao Pang
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
B
Boyi Xiao
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
Z
Zhiqiang Liu
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
P
Pengcheng Chen
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
J
Jiayuan Chen
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
T
Tiantian Yuan
Ant Group, Hangzhou, 31000, China
J
Junming Guan
Jiangsu Jinfu Digital Group AI Technology Co. Ltd., Suzhou, 215133, China
Y
Yidong Jiang
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China
Dawei Cheng
Dawei Cheng
Tongji University
Data MiningGraph LearningDeep LearningBig Data in Finance
J
Jie Xu
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China