🤖 AI Summary
This work addresses the data inefficiency of large language models constrained by Chinchilla scaling laws by introducing Semantic Trajectory Prediction (STP), a novel task that incorporates geometric priors into language modeling for the first time. The approach posits that token sequences evolve along geodesics on a semantic manifold and leverages JEPA-style regularization to constrain the trajectory of hidden states, thereby enhancing the signal-to-noise ratio of training signals without explicit multi-view augmentation while preserving generative diversity. This method substantially surpasses existing scaling laws in data efficiency, achieving baseline-level accuracy on the NL-RX-SYNTH dataset with only 1/16 of the training data.
📝 Abstract
Large Language Models (LLMs) obey consistent scaling laws -- empirical power-law fits that predict how loss decreases with compute, data, and parameters. While predictive, these laws are descriptive rather than prescriptive: they characterize typical training, not optimal training. Surprisingly few works have successfully challenged the data-efficiency bounds implied by these laws -- which is our primary focus. To that end, we introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear. Building on this principle, we propose a novel Semantic Tube Prediction (STP) task, a JEPA-style regularizer that confines hidden-state trajectories to a tubular neighborhood of the geodesic. STP generalizes JEPA to language without requiring explicit multi-view augmentations. We show this constraint improves signal-to-noise ratio, and consequently preserves diversity by preventing trajectory collisions during inference. Empirically, STP allows LLMs to match baseline accuracy with 16$\times$ less training data on the NL-RX-SYNTH dataset, directly violating the data term of Chinchilla-style scaling laws and demonstrating that principled geometric priors can surpass brute-force scaling. Code is available at https://github.com/galilai-group/llm-jepa#stp.