🤖 AI Summary
Existing sarcasm detection methods suffer from limited robustness due to single-perspective analysis, static reasoning paths, and hallucination-prone generation. To address these limitations, we propose a decoupled architecture comprising a Dynamic Agent-based Reasoning Engine (DARE) and a lightweight Reasoning Arbiter (RA), forming a linguistics-driven multi-agent collaborative framework. Specialized agents independently generate structured reasoning chains from distinct linguistic dimensions (e.g., semantic incongruity, pragmatic mismatch, syntactic anomaly), while the RA—trained separately—evaluates and adjudicates their outputs, thereby disentangling reasoning from final classification and substantially mitigating hallucination. Evaluated on four benchmark datasets, our approach achieves state-of-the-art performance, with average improvements of +6.75% in accuracy and +6.29% in Macro-F1. Our core contribution lies in the first systematic integration of multi-perspective dynamic reasoning, linguistics-guided agent specialization, and reasoning-classification decoupling into sarcasm detection.
📝 Abstract
Sarcasm detection is a crucial yet challenging Natural Language Processing task. Existing Large Language Model methods are often limited by single-perspective analysis, static reasoning pathways, and a susceptibility to hallucination when processing complex ironic rhetoric, which impacts their accuracy and reliability. To address these challenges, we propose **SEVADE**, a novel **S**elf-**Ev**olving multi-agent **A**nalysis framework with **D**ecoupled **E**valuation for hallucination-resistant sarcasm detection. The core of our framework is a Dynamic Agentive Reasoning Engine (DARE), which utilizes a team of specialized agents grounded in linguistic theory to perform a multifaceted deconstruction of the text and generate a structured reasoning chain. Subsequently, a separate lightweight rationale adjudicator (RA) performs the final classification based solely on this reasoning chain. This decoupled architecture is designed to mitigate the risk of hallucination by separating complex reasoning from the final judgment. Extensive experiments on four benchmark datasets demonstrate that our framework achieves state-of-the-art performance, with average improvements of **6.75%** in Accuracy and **6.29%** in Macro-F1 score.