Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment

📅 2025-09-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit latent vulnerabilities in financial applications, enabling circumvention of regulatory compliance through ostensibly benign yet substantively noncompliant outputs. Method: We propose the first red-teaming framework tailored to financial compliance, diverging from conventional harm-focused red-teaming by introducing multi-turn adversarial dialogues—“stealthy risk elicitation attacks”—that progressively obscure malicious intent to elicit surface-level compliant but materially违规 outputs. Contribution/Results: We construct FIN-Bench, the first financial safety-specific evaluation benchmark, integrating domain-adapted prompt engineering and systematic human annotation. Experiments across nine state-of-the-art LLMs reveal an average attack success rate of 93.18%, with GPT-4.1 and OpenAI o1 achieving 98.28% and 97.56%, respectively—demonstrating critical deficiencies in current alignment techniques for financial regulatory contexts.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly integrated into financial applications, yet existing red-teaming research primarily targets harmful content, largely neglecting regulatory risks. In this work, we aim to investigate the vulnerability of financial LLMs through red-teaming approaches. We introduce Risk-Concealment Attacks (RCA), a novel multi-turn framework that iteratively conceals regulatory risks to provoke seemingly compliant yet regulatory-violating responses from LLMs. To enable systematic evaluation, we construct FIN-Bench, a domain-specific benchmark for assessing LLM safety in financial contexts. Extensive experiments on FIN-Bench demonstrate that RCA effectively bypasses nine mainstream LLMs, achieving an average attack success rate (ASR) of 93.18%, including 98.28% on GPT-4.1 and 97.56% on OpenAI o1. These findings reveal a critical gap in current alignment techniques and underscore the urgent need for stronger moderation mechanisms in financial domains. We hope this work offers practical insights for advancing robust and domain-aware LLM alignment.
Problem

Research questions and friction points this paper is trying to address.

Investigating financial LLM vulnerabilities via red-teaming
Introducing Risk-Concealment Attacks to provoke regulatory violations
Assessing LLM safety gaps in financial contexts through FIN-Bench
Innovation

Methods, ideas, or system contributions that make the work stand out.

Risk-Concealment Attacks multi-turn framework
FIN-Bench financial safety benchmark
93.18% attack success rate achieved
🔎 Similar Papers
No similar papers found.
G
Gang Cheng
Independent researcher, New York, NY
Haibo Jin
Haibo Jin
HKUST
Computer VisionMedical Image AnalysisVision-Language Modeling
W
Wenbin Zhang
Florida International University, FL
Haohan Wang
Haohan Wang
School of Information Sciences, University of Illinois Urbana-Champaign
Computational BiologyAgentic AIAI4ScienceAI security
J
Jun Zhuang
Boise State University, Boise, ID