Subliminal Effects in Your Data: A General Mechanism via Log-Linearity

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes Logit-Linear-Selection (LLS), a novel method that leverages the implicit signals embedded within preference datasets to steer large language models toward desired latent behaviors—such as specific preferences, cross-lingual responses, or persona shifts—without modifying explicit content. Challenging the conventional data-centric paradigm, this study formalizes the hypothesis that latent subpopulations in data exert consistent influence on model outputs through linear structures in logit space. By selectively curating subsets of generic preference data according to this principle, LLS effectively induces targeted behavioral changes. The approach is rigorously validated across multiple mainstream architectures, demonstrating both the generality and reproducibility of the underlying logit-space linearity mechanism, thereby offering a new perspective beyond pointwise data interpretation for understanding and controlling model behavior.

Technology Category

Application Category

📝 Abstract
Training modern large language models (LLMs) has become a veritable smorgasbord of algorithms and datasets designed to elicit particular behaviors, making it critical to develop techniques to understand the effects of datasets on the model's properties. This is exacerbated by recent experiments that show datasets can transmit signals that are not directly observable from individual datapoints, posing a conceptual challenge for dataset-centric understandings of LLM training and suggesting a missing fundamental account of such phenomena. Towards understanding such effects, inspired by recent work on the linear structure of LLMs, we uncover a general mechanism through which hidden subtexts can arise in generic datasets. We introduce Logit-Linear-Selection (LLS), a method that prescribes how to select subsets of a generic preference dataset to elicit a wide range of hidden effects. We apply LLS to discover subsets of real-world datasets so that models trained on them exhibit behaviors ranging from having specific preferences, to responding to prompts in a different language not present in the dataset, to taking on a different persona. Crucially, the effect persists for the selected subset, across models with varying architectures, supporting its generality and universality.
Problem

Research questions and friction points this paper is trying to address.

subliminal effects
large language models
dataset influence
hidden signals
log-linearity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Logit-Linear-Selection
subliminal effects
large language models
log-linearity
dataset selection
🔎 Similar Papers
No similar papers found.