Sequences of Logits Reveal the Low Rank Structure of Language Models

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing analyses of large language model (LLM) output logits rely on task-specific prompts or model-specific architectures, limiting generalizability. Method: We propose a model-agnostic framework that constructs a logits sequence matrix across diverse prompts and responses, revealing its intrinsic low-rank structure; based on this, we formulate a linear combination generation mechanism grounded in low-rank approximation, enabling cross-prompt response transfer and even zero-shot response synthesis without target prompts. Contribution/Results: We provide theoretical guarantees on representation capacity and learning convergence. Empirical evaluation across multiple state-of-the-art LLMs consistently confirms the显著 low-rankness of logits matrices, with strong alignment between theoretical predictions and empirical observations. This work establishes the first rigorously justified, model-generalizable low-rank abstraction at the logits layer—advancing fundamental understanding of LLM representations and informing efficient inference paradigms.

Technology Category

Application Category

📝 Abstract

A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language models at a model-agnostic level: as sequential probabilistic models. We first empirically demonstrate that a wide range of modern language models exhibit low-rank structure: in particular, matrices built from the model's logits for varying sets of prompts and responses have low approximate rank. We then show that this low-rank structure can be leveraged for generation -- in particular, we can generate a response to a target prompt using a linear combination of the model's outputs on unrelated, or even nonsensical prompts. On the theoretical front, we observe that studying the approximate rank of language models in the sense discussed above yields a simple universal abstraction whose theoretical predictions parallel our experiments. We then analyze the representation power of the abstraction and give provable learning guarantees.

Problem

Research questions and friction points this paper is trying to address.

Revealing inherent low-dimensional structure in large language models

Leveraging low-rank logit structure for cross-prompt generation

Developing theoretical abstraction for language model representation power

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages low-rank structure of logit matrices

Generates responses via linear combinations

Provides model-agnostic theoretical abstraction guarantees

🔎 Similar Papers

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations