🤖 AI Summary
This paper addresses the challenge of automatically detecting emerging “dog whistles”—politically charged, context-dependent expressions with dual meanings designed to evade content moderation—in social media. We introduce FETCH!, a novel unsupervised task: discovering previously undocumented dog whistles from large-scale, dynamic textual corpora. To tackle this, we propose the “shared habitat” hypothesis and present EarShot, the first unsupervised framework for this task. EarShot integrates semantic retrieval via vector databases, contextual understanding using large language models, distributional similarity modeling, and group-level contextual alignment. Evaluated on three real-world social media datasets, EarShot achieves a 3.2× improvement in recall for emerging dog whistles over state-of-the-art methods, demonstrating strong effectiveness and generalizability in low-resource, high-dynamics settings.
📝 Abstract
WARNING: This paper contains content that maybe upsetting or offensive to some readers. Dog whistles are coded expressions with dual meanings: one intended for the general public (outgroup) and another that conveys a specific message to an intended audience (ingroup). Often, these expressions are used to convey controversial political opinions while maintaining plausible deniability and slip by content moderation filters. Identification of dog whistles relies on curated lexicons, which have trouble keeping up to date. We introduce FETCH!, a task for finding novel dog whistles in massive social media corpora. We find that state-of-the-art systems fail to achieve meaningful results across three distinct social media case studies. We present EarShot, a strong baseline system that combines the strengths of vector databases and Large Language Models (LLMs) to efficiently and effectively identify new dog whistles.