🤖 AI Summary
This study addresses the limited understanding of how factual recall performance in large language models relates predictably to model scale and the thematic distribution of training data. By evaluating 38 models on over 8,900 academic citations, the work establishes the first sigmoidal scaling law linking factual recall accuracy to a log-linear combination of model parameter count and topic frequency in the training corpus. This law explains 60% of performance variance across model families and 74–94% within individual families. To account for this phenomenon, the authors propose a signal-to-noise-ratio-based hyperpositional mechanism. Integrating automated citation verification, large-scale evaluation, and statistical modeling, the study introduces a novel paradigm for the predictable modeling of factuality in language models.
📝 Abstract
While scaling laws govern aggregate large language model performance, no scaling law has linked factual recall to both model size and training-data composition. We evaluated 38 models on over 8,900 scholarly references evaluated by an automated reference verification system. Recall quality follows a sigmoid in the log-linear combination of model parameter count and topic representation in training data. These two variables alone explain 60% of the variance across 16 dense models from four families, rising to 74-94% within individual families. The form matches a superposition-inspired account in which recall is gated by a signal-to-noise ratio: signal strength scales with concept frequency and the noise floor with model capacity.