Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment

📅 2025-09-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) exhibit latent vulnerabilities in financial applications, enabling circumvention of regulatory compliance through ostensibly benign yet substantively noncompliant outputs. Method: We propose the first red-teaming framework tailored to financial compliance, diverging from conventional harm-focused red-teaming by introducing multi-turn adversarial dialogues—“stealthy risk elicitation attacks”—that progressively obscure malicious intent to elicit surface-level compliant but materially违规 outputs. Contribution/Results: We construct FIN-Bench, the first financial safety-specific evaluation benchmark, integrating domain-adapted prompt engineering and systematic human annotation. Experiments across nine state-of-the-art LLMs reveal an average attack success rate of 93.18%, with GPT-4.1 and OpenAI o1 achieving 98.28% and 97.56%, respectively—demonstrating critical deficiencies in current alignment techniques for financial regulatory contexts.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly integrated into financial applications, yet existing red-teaming research primarily targets harmful content, largely neglecting regulatory risks. In this work, we aim to investigate the vulnerability of financial LLMs through red-teaming approaches. We introduce Risk-Concealment Attacks (RCA), a novel multi-turn framework that iteratively conceals regulatory risks to provoke seemingly compliant yet regulatory-violating responses from LLMs. To enable systematic evaluation, we construct FIN-Bench, a domain-specific benchmark for assessing LLM safety in financial contexts. Extensive experiments on FIN-Bench demonstrate that RCA effectively bypasses nine mainstream LLMs, achieving an average attack success rate (ASR) of 93.18%, including 98.28% on GPT-4.1 and 97.56% on OpenAI o1. These findings reveal a critical gap in current alignment techniques and underscore the urgent need for stronger moderation mechanisms in financial domains. We hope this work offers practical insights for advancing robust and domain-aware LLM alignment.

Problem

Research questions and friction points this paper is trying to address.

Investigating financial LLM vulnerabilities via red-teaming

Introducing Risk-Concealment Attacks to provoke regulatory violations

Assessing LLM safety gaps in financial contexts through FIN-Bench

Innovation

Methods, ideas, or system contributions that make the work stand out.

Risk-Concealment Attacks multi-turn framework

FIN-Bench financial safety benchmark

93.18% attack success rate achieved

🔎 Similar Papers

No similar papers found.

Authors to Follow