🤖 AI Summary
This study investigates whether morphological relations—such as inflectional paradigms—are linearly decodable from the latent space of large language models. To address this, we propose a relation reconstruction method grounded in intermediate-layer subject representations and cross-layer affine transformations: sparse transformation matrices are analytically derived via model Jacobians, and target forms are reconstructed using a two-stage affine approximation. Experiments across diverse multilingual benchmarks and architectures—including LLaMA, Phi, and Qwen—demonstrate that the method reconstructs inflected forms with approximately 90% fidelity, confirming the strong linear separability of morphological relations within model subspaces and their cross-architectural and cross-lingual generalizability. Our core contribution is the first systematic empirical validation that complex morphological transformations can be accurately approximated by a small set of cross-layer linear operations—providing novel evidence for the interpretability and structured semantic geometry of internal model representations.
📝 Abstract
A two-part affine approximation has been found to be a good approximation for transformer computations over certain subject object relations. Adapting the Bigger Analogy Test Set, we show that the linear transformation Ws, where s is a middle layer representation of a subject token and W is derived from model derivatives, is also able to accurately reproduce final object states for many relations. This linear technique is able to achieve 90% faithfulness on morphological relations, and we show similar findings multi-lingually and across models. Our findings indicate that some conceptual relationships in language models, such as morphology, are readily interpretable from latent space, and are sparsely encoded by cross-layer linear transformations.