BITS Pilani at SemEval-2026 Task 9: Structured Supervised Fine-Tuning with DPO Refinement for Polarization Detection

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This study addresses the challenges of online polarization detection in multilingual and multicultural contexts, where rhetorical subtlety, implicit framing, and high annotation costs hinder model performance. To overcome these issues, the authors propose a two-stage optimization approach: first, they perform structured supervised fine-tuning of Qwen 2.5-7B-Instruct using an interpretable slot-filling template that encodes target entities, claim types, expressive manifestations, and supporting rationales; second, they apply annotation-free Direct Preference Optimization (DPO) leveraging automatically generated preference pairs. This method substantially enhances the model’s sensitivity to and interpretability of polarizing discourse, achieving a recall increase from 0.5085 to 0.7797 and a ~5-point improvement in macro F1 on the SemEval-2026 POLAR English development set—all without requiring additional human annotations.

Technology Category

Application Category

📝 Abstract

The POLAR SemEval-2026 Shared Task aims to detect online polarization and focuses on the classification and identification of multilingual, multicultural, and multi-event polarization. Accurate computational detection of online polarization is challenging due to nuanced rhetoric, implicit framing, and the high cost of human-in-the-loop annotation. Building on recent findings that contextual prompting enables large language models to function as strong polarization detectors, we present a two-stage approach for detecting political polarization in social media text that combines structured supervised fine-tuning with Direct Preference Optimization (DPO) refinement. We fine-tune Qwen 2.5-7B-Instruct with LoRA using an interpretable slot-filling template (target, claim type, manifestation checklist, and justification). We then apply DPO with automatically generated preference pairs to reduce costly false negatives. Experiments on the SemEval 2026 POLAR shared task dataset show that preference-based refinement improves both accuracy and decreases false negatives without extra annotation. On the English development set, DPO increases recall from 0.5085 to 0.7797 and improves macro-F1 by ~5 points.

Problem

Research questions and friction points this paper is trying to address.

online polarization

polarization detection

multilingual

social media text

political polarization

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured supervised fine-tuning

Direct Preference Optimization

polarization detection