Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Community Notes exhibits significant latency in governing health-related misinformation—median response time reaches 17.6 hours—rendering it ineffective against rapidly emerging false claims. To address this, we propose CrowdNotes+, a framework integrating large language models (LLMs) to accelerate and enhance the reliability of crowdsourced fact-checking. Our contributions are threefold: (1) a hierarchical three-stage evaluation protocol that decouples evidence relevance, factual accuracy, and clinical utility—mitigating the common bias of conflating linguistic fluency with factual correctness; (2) LLM-driven evidence-anchored annotation refinement and utility-guided automatic annotation generation; and (3) the HealthNotes benchmark—a curated health-domain dataset—and a fine-tuned helpfulness discrimination model. Experiments demonstrate substantial improvements in factual precision and evidence utility, advancing efficient, rigorous human-AI collaboration for health misinformation governance.

Technology Category

Application Category

📝 Abstract

Community Notes, the crowd-sourced misinformation governance system on X (formerly Twitter), enables users to flag misleading posts, attach contextual notes, and vote on their helpfulness. However, our analysis of 30.8K health-related notes reveals significant latency, with a median delay of 17.6 hours before the first note receives a helpfulness status. To improve responsiveness during real-world misinformation surges, we propose CrowdNotes+, a unified framework that leverages large language models (LLMs) to augment Community Notes for faster and more reliable health misinformation governance. CrowdNotes+ integrates two complementary modes: (1) evidence-grounded note augmentation and (2) utility-guided note automation, along with a hierarchical three-step evaluation that progressively assesses relevance, correctness, and helpfulness. We instantiate the framework through HealthNotes, a benchmark of 1.2K helpfulness-annotated health notes paired with a fine-tuned helpfulness judge. Experiments on fifteen LLMs reveal an overlooked loophole in current helpfulness evaluation, where stylistic fluency is mistaken for factual accuracy, and demonstrate that our hierarchical evaluation and LLM-augmented generation jointly enhance factual precision and evidence utility. These results point toward a hybrid human-AI governance model that improves both the rigor and timeliness of crowd-sourced fact-checking.

Problem

Research questions and friction points this paper is trying to address.

Addressing significant latency in crowd-sourced health misinformation governance

Improving responsiveness during real-world health misinformation surges

Overcoming stylistic fluency being mistaken for factual accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-augmented framework for faster misinformation governance

Hierarchical evaluation assessing relevance, correctness, and helpfulness

Hybrid human-AI model combining crowd-sourcing with automated generation

🔎 Similar Papers

No similar papers found.

Authors to Follow