Crowd-SFT: Crowdsourcing for LLM Alignment

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) rely heavily on small-scale expert annotations, suffering from high costs, strong annotator bias, and poor scalability. To address these limitations, this paper introduces the first open-source crowdsourced SFT framework enabling large-scale, low-barrier, and fairly incentivized human feedback collection. Our method features: (1) a point-based reward mechanism calibrated via Shapley values, ensuring fair attribution of annotation contributions and scalable incentive alignment; and (2) a multi-model iterative selection framework that significantly accelerates convergence through dynamic optimization. Experiments demonstrate that our framework reduces the distance between the target model’s outputs and ideal responses by 55%. Moreover, the point-based rewards exhibit strong consistency with Shapley values (Spearman’s ρ > 0.92), validating the framework’s fairness, robustness, and scalability.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) increasingly rely on Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) to align model responses with human preferences. While RLHF employs a reinforcement learning approach with a separate reward model, SFT uses human-curated datasets for supervised learning. Both approaches traditionally depend on small, vetted groups of annotators, making them costly, prone to bias, and limited in scalability. We propose an open, crowd-sourced fine-tuning framework that addresses these limitations by enabling broader feedback collection for SFT without extensive annotator training. Our framework promotes incentive fairness via a point-based reward system correlated with Shapley values and guides model convergence through iterative model updates. Our multi-model selection framework demonstrates up to a 55% reduction in target distance over single-model selection, enabling subsequent experiments that validate our point-based reward mechanism's close alignment with Shapley values (a well-established method for attributing individual contributions) thereby supporting fair and scalable participation.
Problem

Research questions and friction points this paper is trying to address.

LLM alignment relies on costly small annotator groups
Traditional SFT and RLHF methods lack scalability and fairness
Current approaches are prone to bias and limited feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open crowd-sourced fine-tuning framework
Point-based reward system with Shapley values
Iterative model updates for convergence
🔎 Similar Papers
No similar papers found.
A
Alex Sotiropoulos
Viterbi School of Engineering, University of Southern California, Los Angeles, USA
S
Sulyab Thottungal Valapu
Viterbi School of Engineering, University of Southern California, Los Angeles, USA
L
Linus Lei
Viterbi School of Engineering, University of Southern California, Los Angeles, USA
Jared Coleman
Jared Coleman
Assistant Professor of Computer Science, Loyola Marymount University
cooperative mobile roboticsalgorithmsdistributed algorithmstask scheduling
Bhaskar Krishnamachari
Bhaskar Krishnamachari
Professor of Electrical and Computer Engineering, and Computer Science, USC
Internet of ThingsAIMachine LearningBlockchainConnected Vehicles