Relative-Based Scaling Law for Neural Language Models

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing scaling laws for language models rely heavily on cross-entropy loss, which measures only the absolute probability assigned to correct tokens—ignoring models’ relative ranking ability among candidate tokens, a property critical to practical decoding strategies such as greedy decoding. Method: This work introduces a novel *relative-ranking perspective* on scaling laws, proposing the Relative-Based Probability (RBP) metric to quantify the rank quality of correct tokens within the model’s predicted distribution, and deriving a “Relativity Scaling Law” that characterizes the power-law relationship between model scale and correct-token rank. Contribution/Results: The law is empirically validated across four model families, four diverse datasets, and five orders of magnitude in parameter count, demonstrating strong robustness. By shifting focus from absolute likelihood to relative ordering, this framework transcends conventional evaluation paradigms, offers a mechanistic explanation for emergent capabilities in large language models, and advances scaling laws toward a unified theoretical foundation.

Technology Category

Application Category

📝 Abstract
Scaling laws aim to accurately predict model performance across different scales. Existing scaling-law studies almost exclusively rely on cross-entropy as the evaluation metric. However, cross-entropy provides only a partial view of performance: it measures the absolute probability assigned to the correct token, but ignores the relative ordering between correct and incorrect tokens. Yet, relative ordering is crucial for language models, such as in greedy-sampling scenario. To address this limitation, we investigate scaling from the perspective of relative ordering. We first propose the Relative-Based Probability (RBP) metric, which quantifies the probability that the correct token is ranked among the top predictions. Building on this metric, we establish the Relative-Based Scaling Law, which characterizes how RBP improves with increasing model size. Through extensive experiments on four datasets and four model families spanning five orders of magnitude, we demonstrate the robustness and accuracy of this law. Finally, we illustrate the broad application of this law with two examples, namely providing a deeper explanation of emergence phenomena and facilitating finding fundamental theories of scaling laws. In summary, the Relative-Based Scaling Law complements the cross-entropy perspective and contributes to a more complete understanding of scaling large language models. Thus, it offers valuable insights for both practical development and theoretical exploration.
Problem

Research questions and friction points this paper is trying to address.

Proposes a relative-based metric to evaluate token ranking in language models
Establishes scaling laws based on relative ordering rather than cross-entropy
Provides deeper understanding of model scaling and emergent phenomena
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed Relative-Based Probability metric for ranking
Established scaling law based on relative ordering
Validated law across multiple datasets and models
🔎 Similar Papers
No similar papers found.
B
Baoqing Yue
Tsinghua University
J
Jinyuan Zhou
Tsinghua University
Z
Zixi Wei
Tsinghua University
Jingtao Zhan
Jingtao Zhan
Tsinghua University
Information RetrievalNatural Language ProcessingAI
Qingyao Ai
Qingyao Ai
Associate Professor, Dept. of CS&T, Tsinghua University
Information RetrievalMachine Learning
Y
Yiqun Liu
Tsinghua University