🤖 AI Summary
This study investigates NLP practitioners’ perceptions and practical challenges regarding data fairness throughout the data development, annotation, and deployment lifecycle. Drawing on a 2024 mixed-methods study—comprising surveys, focus groups, and socio-technical analysis—with U.S.-based practitioners, it systematically uncovers structural tensions between commercial imperatives and fairness commitments. Its key contributions are threefold: first, adopting a practitioner-centered lens to bridge technical practice, organizational governance, and policy frameworks (e.g., the U.S. AI Bill of Rights), while critically exposing “diversity-washing” as a performative fairness strategy; second, proposing a participatory governance model that empowers practitioners with decision-making autonomy and embeds community-informed consent mechanisms; and third, advocating for institutionalized support structures to operationalize accountability. The findings provide empirical grounding and multi-level governance recommendations for building responsible, auditable NLP data workflows.
📝 Abstract
While research has focused on surfacing and auditing algorithmic bias to ensure equitable AI development, less is known about how NLP practitioners - those directly involved in dataset development, annotation, and deployment - perceive and navigate issues of NLP data equity. This study is among the first to center practitioners' perspectives, linking their experiences to a multi-scalar AI governance framework and advancing participatory recommendations that bridge technical, policy, and community domains. Drawing on a 2024 questionnaire and focus group, we examine how U.S.-based NLP data practitioners conceptualize fairness, contend with organizational and systemic constraints, and engage emerging governance efforts such as the U.S. AI Bill of Rights. Findings reveal persistent tensions between commercial objectives and equity commitments, alongside calls for more participatory and accountable data workflows. We critically engage debates on data diversity and diversity washing, arguing that improving NLP equity requires structural governance reforms that support practitioner agency and community consent.