Data, Not Model: Explaining Bias toward LLM Texts in Neural Retrievers

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work reveals that neural retrievers exhibit a source bias favoring text generated by large language models (LLMs) under semantically similar conditions. The study demonstrates for the first time that this bias stems not from model architecture but from imbalances in non-semantic artifacts—such as fluency and term specificity—between human-annotated positive and negative training samples. The authors theoretically prove that contrastive learning inevitably absorbs such biases. To mitigate this issue, they propose two complementary strategies: data rectification through training set reconstruction and embedding-space debiasing via projection along bias directions. Experiments confirm that the root cause lies in the training data and show that the proposed methods significantly enhance the fairness and reliability of retrieval systems.

Technology Category

Application Category

📝 Abstract

Recent studies show that neural retrievers often display source bias, favoring passages generated by LLMs over human-written ones, even when both are semantically similar. This bias has been considered an inherent flaw of retrievers, raising concerns about the fairness and reliability of modern information access systems. Our work challenges this view by showing that source bias stems from supervision in retrieval datasets rather than the models themselves. We found that non-semantic differences, like fluency and term specificity, exist between positive and negative documents, mirroring differences between LLM and human texts. In the embedding space, the bias direction from negatives to positives aligns with the direction from human-written to LLM-generated texts. We theoretically show that retrievers inevitably absorb the artifact imbalances in the training data during contrastive learning, which leads to their preferences over LLM texts. To mitigate the effect, we propose two approaches: 1) reducing artifact differences in training data and 2) adjusting LLM text vectors by removing their projection on the bias vector. Both methods substantially reduce source bias. We hope our study alleviates some concerns regarding LLM-generated texts in information access systems.

Problem

Research questions and friction points this paper is trying to address.

source bias

neural retrievers

LLM-generated texts

retrieval fairness

training data artifacts

Innovation

Methods, ideas, or system contributions that make the work stand out.

source bias

neural retrievers

LLM-generated text