Dia-Lingle: A Gamified Interface for Dialectal Data Collection

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dialectal text resources are severely scarce due to their predominantly oral nature and high geographical heterogeneity, impeding scalable data collection and technical integration. Method: We propose a gamified crowdsourcing platform that innovatively integrates active learning with dynamic difficulty adjustment, featuring a dual-task framework—“text annotation + geospatial matching”—to sustain user engagement and improve annotation quality. The platform incorporates a dialect classifier and a real-time feedback system to enable interactive, high-accuracy geotagging. Contribution/Results: Usability evaluation confirms high user satisfaction, validating the approach’s effectiveness. Experiments significantly expand a multi-regional dialect corpus; geotagging accuracy improves by 23.6% over baselines. Our framework establishes a scalable, human-in-the-loop paradigm for low-resource dialect NLP data acquisition.

Technology Category

Application Category

📝 Abstract
Dialects suffer from the scarcity of computational textual resources as they exist predominantly in spoken rather than written form and exhibit remarkable geographical diversity. Collecting dialect data and subsequently integrating it into current language technologies present significant obstacles. Gamification has been proven to facilitate remote data collection processes with great ease and on a substantially wider scale. This paper introduces Dia-Lingle, a gamified interface aimed to improve and facilitate dialectal data collection tasks such as corpus expansion and dialect labelling. The platform features two key components: the first challenges users to rewrite sentences in their dialects, identifies them through a classifier and solicits feedback, and the other one asks users to match sentences to their geographical locations. Dia-Lingle combines active learning with gamified difficulty levels, strategically encouraging prolonged user engagement while efficiently enriching the dialect corpus. Usability evaluation shows that our interface demonstrates high levels of user satisfaction. We provide the link to Dia-Lingle: https://dia-lingle.ivia.ch/, and demo video: https://youtu.be/0QyJsB8ym64.
Problem

Research questions and friction points this paper is trying to address.

Addresses dialect data scarcity through gamified collection interface
Facilitates corpus expansion and dialect labeling via interactive tasks
Combines active learning with gamification for sustained user engagement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gamified interface for dialectal data collection
Active learning with gamified difficulty levels
Sentence rewriting and geographical matching tasks
🔎 Similar Papers
No similar papers found.