Proactive Defense: Compound AI for Detecting Persuasion Attacks and Measuring Inoculation Effectiveness

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the detection and quantification of persuasive attacks targeting human cognition across information environments, with particular emphasis on large language models’ (LLMs) domain-specific vulnerabilities to enhance generative AI safety and human cognitive resilience. Method: We propose BRIES, a composite AI system integrating generative adversarial agents, configurable detectors, content-immunization defenses, and causal inference modules to enable multi-agent collaboration for attack identification, defensive response, and causal attribution. Methodologically, we introduce a persuasion-technique taxonomy grounded in SemEval 2023, a controllable synthetic dataset, and a causal evaluation framework to uncover LLM-specific disparities in rhetorical comprehension. Contribution/Results: Experiments demonstrate GPT-4’s superior performance in detecting complex persuasive techniques; open-source LLMs exhibit significant limitations in fine-grained rhetorical recognition; and temperature settings and prompt engineering critically modulate detection robustness. Code and data are publicly released.

Technology Category

Application Category

📝 Abstract

This paper introduces BRIES, a novel compound AI architecture designed to detect and measure the effectiveness of persuasion attacks across information environments. We present a system with specialized agents: a Twister that generates adversarial content employing targeted persuasion tactics, a Detector that identifies attack types with configurable parameters, a Defender that creates resilient content through content inoculation, and an Assessor that employs causal inference to evaluate inoculation effectiveness. Experimenting with the SemEval 2023 Task 3 taxonomy across the synthetic persuasion dataset, we demonstrate significant variations in detection performance across language agents. Our comparative analysis reveals significant performance disparities with GPT-4 achieving superior detection accuracy on complex persuasion techniques, while open-source models like Llama3 and Mistral demonstrated notable weaknesses in identifying subtle rhetorical, suggesting that different architectures encode and process persuasive language patterns in fundamentally different ways. We show that prompt engineering dramatically affects detection efficacy, with temperature settings and confidence scoring producing model-specific variations; Gemma and GPT-4 perform optimally at lower temperatures while Llama3 and Mistral show improved capabilities at higher temperatures. Our causal analysis provides novel insights into socio-emotional-cognitive signatures of persuasion attacks, revealing that different attack types target specific cognitive dimensions. This research advances generative AI safety and cognitive security by quantifying LLM-specific vulnerabilities to persuasion attacks and delivers a framework for enhancing human cognitive resilience through structured interventions before exposure to harmful content.

Problem

Research questions and friction points this paper is trying to address.

Detect persuasion attacks using compound AI architecture

Measure inoculation effectiveness through causal inference analysis

Quantify LLM vulnerabilities to enhance cognitive security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compound AI architecture with specialized adversarial agents

Causal inference to evaluate content inoculation effectiveness

Prompt engineering and temperature tuning for detection optimization

🔎 Similar Papers

No similar papers found.