🤖 AI Summary
This study addresses the high regulatory risks inherent in large language model (LLM)–based financial dialogues, where existing approaches struggle to manage multi-turn semantic evolution and complex compliance constraints. To tackle this challenge, the authors propose FinSec—the first end-to-end safety detection framework tailored for financial agents—featuring a novel multi-stage generative backtracking mechanism. FinSec integrates suspicious behavior pattern analysis, adversarial reasoning, semantic safety modeling, and ensemble risk decision-making to enable structured and interpretable risk identification. Experimental results demonstrate that FinSec achieves an F1 score of 90.13% (a 6–14 percentage point improvement over baselines), reduces the attack success rate (ASR) to 9.09%, attains an AUPRC of 0.9189, and yields a utility-safety composite score of 0.9098, substantially enhancing both detection robustness and accuracy.
📝 Abstract
With the rapid adoption of large language models (LLMs) in financial service scenarios, dialogue security detection under high regulatory risk presents significant challenges. Existing methods mainly rely on single-dimensional semantic judgments or fixed rules, making them inadequate for handling multi-turn semantic evolution and complex regulatory clauses; moreover, they lack models specifically designed for financial security detection. To address these issues, this paper proposes FinSec, a four-tier security detection framework for financial agent. FinSec enables structured, interpretable, and end-to-end identification of actual financial risks, incorporating suspicious behavior pattern analysis, delayed risk and adversarial inference, semantic security analysis, and integrated risk-based decision-making. Notably, FinSec significantly enhances the robustness of high-risk dialogue detection while maintaining model utility. Experimental results demonstrate FinSec's leading performance. In terms of overall detection capability, FinSec achieves an F1 score of 90.13%, improving upon baseline models by 6--14 percentage points; its ASR is reduced to 9.09%, markedly lowering the probability of unsafe outputs; and the AUPRC increases to 0.9189 -- an approximate 9.7% gain over general frameworks. Additionally, in balancing utility and safety, FinSec obtains a composite score of 0.9098, delivering robust and efficient protection for financial agent dialogues.