mdok-style at SemEval-2026 Task 10: Finetuning LLMs for Conspiracy Detection

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study addresses the challenge of detecting conspiracy theory beliefs in Reddit comments under few-shot learning scenarios. To overcome the scarcity of labeled data, the authors propose a fine-tuning approach that combines data augmentation with self-training, effectively adapting machine-generated text detection techniques to the task of conspiracy theory identification. The method employs binary classification modeling using the Qwen3-32B large language model, demonstrating significant performance gains despite limited supervision. Evaluated on SemEval-2026 Task 10, the approach achieved 8th place out of 52 participating teams (ranking within the top 15%), thereby validating its effectiveness and methodological innovation in low-resource settings for belief detection in social media discourse.

📝 Abstract

SemEval-2026 Task 10 is focused on conspiracy detection. Specifically, the goal is to detect whether a Reddit comment expresses a conspiracy belief. Our submitted mdok-style system utilizes data augmentation and self-training (to cope with a rather small amount of training data) to finetune the Qwen3-32B model for a binary text-classification task. The submitted system is very competitive, ranking in the 85th percentile (8th out of 52 submissions). The results shown that our approach, which originated in machine-generated text detection, can be used for conspiracy detection as well.

Problem

Research questions and friction points this paper is trying to address.

conspiracy detection

Reddit comment

binary text classification

conspiracy belief

Innovation

Methods, ideas, or system contributions that make the work stand out.

data augmentation

self-training

LLM finetuning

conspiracy detection