The Human Labour of Data Work: Capturing Cultural Diversity through World Wide Dishes

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses pervasive cultural biases, web-crawling quality deficiencies, and the systemic invisibility of labor in machine learning datasets. We propose a participatory data construction paradigm, implementing a collaborative crowdsourcing framework, culturally situated collection protocols, and qualitative reflexive analysis—grounded in practice logs and interviews—to co-create World Wide Dishes (WWD), the first high-quality, multimodal food culture dataset built collectively by globally diverse communities. We systematically identify and theorize four categories of critical invisible labor: community trust-building, participatory accessibility design, data production support, and interpretation of data–culture relationships. Our contributions include: (1) a reusable, open framework for participatory dataset construction; (2) empirical validation of decentralized, anti-colonial data practices; and (3) interdisciplinary innovation bridging CSCW and ML data governance.

Technology Category

Application Category

📝 Abstract
We provide a window into the process of constructing a dataset for machine learning (ML) applications by reflecting on the process of building World Wide Dishes (WWD), an image and text dataset consisting of culinary dishes and their associated customs from around the world. WWD takes a participatory approach to dataset creation: community members guide the design of the research process and engage in crowdsourcing efforts to build the dataset. WWD responds to calls in ML to address the limitations of web-scraped Internet datasets with curated, high-quality data incorporating localised expertise and knowledge. Our approach supports decentralised contributions from communities that have not historically contributed to datasets as a result of a variety of systemic factors. We contribute empirical evidence of the invisible labour of participatory design work by analysing reflections from the research team behind WWD. In doing so, we extend computer-supported cooperative work (CSCW) literature that examines the post-hoc impacts of datasets when deployed in ML applications by providing a window into the dataset construction process. We surface four dimensions of invisible labour in participatory dataset construction: building trust with community members, making participation accessible, supporting data production, and understanding the relationship between data and culture. This paper builds upon the rich participatory design literature within CSCW to guide how future efforts to apply participatory design to dataset construction can be designed in a way that attends to the dynamic, collaborative, and fundamentally human processes of dataset creation.
Problem

Research questions and friction points this paper is trying to address.

Capturing cultural diversity in datasets
Addressing limitations of web-scraped datasets
Highlighting invisible labour in dataset construction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Participatory approach to dataset creation
Incorporates localised expertise and knowledge
Supports decentralised community contributions
🔎 Similar Papers
No similar papers found.