Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

This work addresses the challenge of strategic misreporting by workers in mobile crowdsourcing, which can induce linear regret in existing online fine-tuning methods for large language models (LLMs). To mitigate this issue, the authors propose an online weighted aggregation mechanism grounded in dynamic Bayesian games that dynamically evaluates and adjusts the weight assigned to each worker’s feedback, thereby incentivizing truthful preference reporting. This mechanism is the first to simultaneously guarantee truthfulness and achieve sublinear regret in online preference aggregation for mobile crowdsourcing, while accommodating realistic settings with limited feedback per time slot. Theoretical analysis establishes a regret bound of $O(\sqrt{T})$, and empirical evaluations on real-world datasets demonstrate significant performance improvements over current benchmark approaches.

📝 Abstract

To better serve users' demands in mobile applications (e.g., navigation), mobile crowdsourcing platforms can iteratively align large language model (LLM)-generated content (e.g., AI-generated traffic condition predictions) with human feedback collected from crowdsourcing workers (e.g., mobile users). However, workers may strategically misreport their online preference feedback to maximize their influence or payment. Existing pipelines in mobile crowdsourcing (e.g., EM-based weight estimation) fail to identify the most accurate worker in this online setting, resulting in a linear regret $\mathcal{O}(T)$ over $T$ time slots. In this paper, we study truthful online preference aggregation for LLM fine-tuning in mobile crowdsourcing. We formulate a new dynamic Bayesian game to model the multi-agent online learning process between the platform and strategic mobile workers. We propose a novel online weighted aggregation mechanism that dynamically adjusts each worker's weight in the preference aggregation according to their feedback accuracy. We prove that our mechanism ensures truthful feedback from strategic workers and achieves a sublinear regret $\mathcal{O}(\sqrt{T})$ over $T$ time slots. We further extend our mechanism to a challenging scenario with limited worker feedback per time slot, still guaranteeing a sublinear regret $\mathcal{O}(\sqrt{T})$. Experiments on LLM fine-tuning with real-world datasets further demonstrate significant performance gains of our mechanisms over benchmark schemes.

Problem

Research questions and friction points this paper is trying to address.

truthful preference aggregation

online learning

mobile crowdsourcing

strategic workers

LLM fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

truthful mechanism

online preference aggregation

LLM fine-tuning