Bridging Social Media and Search Engines: Dredge Words and the Detection of Unreliable Domains

📅 2024-06-17

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper addresses the challenge of rapidly identifying untrustworthy domains in social media and search engines. We introduce the novel concept of “dredge words”—search queries disproportionately monopolized by low-credibility domains—and propose the first joint credibility modeling framework integrating web page graphs with social propagation graphs. Methodologically, we design a novel dredge word detection algorithm that jointly models search ranking bias and social retweeting paths; we further develop a multi-source heterogeneous graph neural network (combining WebGraph with social mention/retweet graphs) to jointly learn representations of search intent and domain credibility. Contributions include: (1) releasing the first high-quality benchmark dataset of 12,000 annotated dredge words; (2) achieving state-of-the-art performance on website credibility classification; (3) significantly improving top-k untrustworthy domain identification accuracy; and (4) uncovering, for the first time, strong empirical associations between dredge words and social platforms as well as e-commerce ecosystems.

Technology Category

Application Category

📝 Abstract

Proactive content moderation requires platforms to rapidly and continuously evaluate the credibility of websites. Leveraging the direct and indirect paths users follow to unreliable websites, we develop a website credibility classification and discovery system that integrates both webgraph and large-scale social media contexts. We additionally introduce the concept of dredge words, terms or phrases for which unreliable domains rank highly on search engines, and provide the first exploration of their usage on social media. Our graph neural networks that combine webgraph and social media contexts generate to state-of-the-art results in website credibility classification and significantly improves the top-k identification of unreliable domains. Additionally, we release a novel dataset of dredge words, highlighting their strong connections to both social media and online commerce platforms.

Problem

Research questions and friction points this paper is trying to address.

Detecting unreliable domains through social media

Integrating webgraph and social media for credibility

Exploring dredge words in website credibility classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph neural networks integration

Dredge words concept introduction

Webgraph and social media combination

🔎 Similar Papers

No similar papers found.