CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the vulnerability of large language models to contextual “decision shortcuts” in ambiguous content moderation, which undermines judgment robustness. The authors propose CARO, a two-stage training framework that first constructs analogical reasoning chains via retrieval-augmented generation and applies supervised fine-tuning, followed by a tailored direct preference optimization stage to reinforce analogical reasoning behaviors and dynamically generate context-adaptive analogical references, thereby overcoming the limitations of static retrieval. Inspired by expert analogical reasoning mechanisms from cognitive psychology, this approach achieves a 24.9% average F1 improvement on ambiguous moderation benchmarks, substantially outperforming state-of-the-art models such as DeepSeek R1, QwQ, and LLaMA Guard.

Technology Category

Application Category

📝 Abstract
Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading "decision shortcuts" embedded in context. Inspired by cognitive psychology insights into expert moderation, we introduce \caro (Chain-of-Analogy Reasoning Optimization), a novel two-stage training framework to induce robust analogical reasoning in LLMs. First, \caro bootstraps analogical reasoning chains via retrieval-augmented generation (RAG) on moderation data and performs supervised fine-tuning (SFT). Second, we propose a customized direct preference optimization (DPO) approach to reinforce analogical reasoning behaviors explicitly. Unlike static retrieval methods, \caro dynamically generates tailored analogical references during inference, effectively mitigating harmful decision shortcuts. Extensive experiments demonstrate that \caro substantially outperforms state-of-the-art reasoning models (DeepSeek R1, QwQ), specialized moderation models (LLaMA Guard), and advanced fine-tuning and retrieval-augmented methods, achieving an average F1 score improvement of 24.9\% on challenging ambiguous moderation benchmarks.
Problem

Research questions and friction points this paper is trying to address.

content moderation
ambiguous cases
decision shortcuts
large language models
robust reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Analogy Reasoning
Content Moderation
Retrieval-Augmented Generation
Direct Preference Optimization
Decision Shortcuts
🔎 Similar Papers
No similar papers found.