🤖 AI Summary
Existing analyses of large language model (LLM) output logits rely on task-specific prompts or model-specific architectures, limiting generalizability. Method: We propose a model-agnostic framework that constructs a logits sequence matrix across diverse prompts and responses, revealing its intrinsic low-rank structure; based on this, we formulate a linear combination generation mechanism grounded in low-rank approximation, enabling cross-prompt response transfer and even zero-shot response synthesis without target prompts. Contribution/Results: We provide theoretical guarantees on representation capacity and learning convergence. Empirical evaluation across multiple state-of-the-art LLMs consistently confirms the显著 low-rankness of logits matrices, with strong alignment between theoretical predictions and empirical observations. This work establishes the first rigorously justified, model-generalizable low-rank abstraction at the logits layer—advancing fundamental understanding of LLM representations and informing efficient inference paradigms.
📝 Abstract
A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts.
On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.