🤖 AI Summary
This work addresses the challenge of contextual deception arising from misleading text paired with images by proposing ACCNote, an automated community annotation generation framework grounded in large vision-language models. ACCNote integrates retrieval-augmented generation with a multi-agent collaboration mechanism to produce concise, evidence-based contextual correction notes that help users identify disinformation. The study introduces XCheck, the first real-world dataset specifically curated for contextual deception, and proposes CHS, a novel evaluation metric aligned with human interpretability. Extensive experiments on XCheck demonstrate that ACCNote significantly outperforms existing baselines and commercial tools such as GPT-4o-mini in both deception detection and annotation generation tasks.
📝 Abstract
Community Notes have emerged as an effective crowd-sourced mechanism for combating online deception on social media platforms. However, its reliance on human contributors limits both the timeliness and scalability. In this work, we study the automated Community Notes generation method for image-based contextual deception, where an authentic image is paired with misleading context (e.g., time, entity, and event). Unlike prior work that primarily focuses on deception detection (i.e., judging whether a post is true or false in a binary manner), Community Notes-style systems need to generate concise and grounded notes that help users recover the missing or corrected context. This problem remains underexplored due to three reasons: (i) datasets that support the research are scarce; (ii) methods must handle the dynamic nature of contextual deception; (iii) evaluation is difficult because standard metrics do not capture whether notes actually improve user understanding. To address these gaps, we curate a real-world dataset, XCheck, comprising X posts with associated Community Notes and external contexts. We further propose the Automated Context-Corrective Note generation method, named ACCNote, which is a retrieval-augmented, multi-agent collaboration framework built on large vision-language models. Finally, we introduce a new evaluation metric, Context Helpfulness Score (CHS), that aligns with user study outcomes rather than relying on lexical overlap. Experiments on our XCheck dataset show that the proposed ACCNote improves both deception detection and note generation performance over baselines, and exceeds a commercial tool GPT5-mini. Together, our dataset, method, and metric advance practical automated generation of context-corrective notes toward more responsible online social networks.