🤖 AI Summary
This work addresses the limitations of traditional community note scoring algorithms, which—after removing ideological bias—treat all raters equally, leading to slow consensus convergence and vulnerability to noise or strategic ratings. The authors propose Quality-Sensitive Matrix Factorization (QSMF), a novel approach that, for the first time, models rater quality sensitivity as a learnable parameter, dynamically adjusting each rater’s influence without requiring external ground truth. Built upon a matrix factorization framework, QSMF jointly optimizes note quality, rater ideology, and quality-sensitivity parameters. Experiments demonstrate that QSMF achieves baseline-level accuracy with 26–40% fewer ratings on real-world data, significantly reduces quality estimation error under synthetic attacks and noisy conditions, and attains an AUC above 0.94 in identifying reliable raters, thereby substantially improving model robustness and sample efficiency.
📝 Abstract
Community Notes is X's crowdsourced fact-checking program: contributors write short notes that add context to potentially misleading posts, and other contributors rate whether those notes are helpful. Its algorithm uses a matrix factorization model to separate ideology from note quality, so notes are surfaced only when they receive support across ideological lines. After ideology is accounted for, however, the model gives all raters equal influence on quality estimates. This slows consensus formation and leaves the quality estimate vulnerable to noisy or strategic raters. We propose Quality-Sensitive Matrix Factorization (QSMF), which uses a per-rater quality-sensitivity parameter \(\hatρ_i\) estimated jointly with all other parameters. This connects QSMF to peer prediction: without external ground truth, it gives more influence to raters whose ideology-adjusted ratings are more consistent with the note-quality estimates learned from all the ratings.
We evaluate QSMF on 45M ratings over 365K notes from the six months before the 2024 U.S. presidential election. Split-half tests confirm that quality sensitivity is a stable, empirically recoverable rater trait. In evaluation on high-traffic notes, QSMF requires 26--40\% fewer ratings to match the baseline's accuracy. In semi-synthetic coordinated attacks on notes of opposing ideology, QSMF substantially reduces displacement on the estimated quality estimates of targeted notes relative to the baseline. In synthetic data with known ground truth, \(\hatρ_i\) separates good from bad raters with an AUC above 0.94, and achieves much lower error in recovering the true note quality estimates in the presence of bad raters. These gains come from a single additional scalar parameter per rater, with no external ground truth and no manual moderation.