🤖 AI Summary
This work addresses the opacity of internal representations in large language models by modeling token embeddings as geometric trajectories. Inspired by the Kakeya conjecture, the authors introduce a “stickiness” constraint to enhance the geometric structure of these representations. They propose two differentiable regularizers, KT-CW and KT-Attn, which jointly incorporate geometric isotropy and attention diversity into interpretability-aware training—a novel integration in the field. Experiments on Gemma-3 and Llama-3-8B demonstrate that the approach significantly improves geometric desiderata and reduces certain fairness biases while preserving task accuracy, with more pronounced benefits observed in medium-scale models.
📝 Abstract
Large language models (LLMs) demonstrate strong performance, but they often lack transparency. We introduce GeoLAN, a training framework that treats token representations as geometric trajectories and applies stickiness conditions inspired by recent developments related to the Kakeya Conjecture. We have developed two differentiable regularizers, Katz-Tao Convex Wolff (KT-CW) and Katz-Tao Attention (KT-Attn), that promote isotropy and encourage diverse attention. Our experiments with Gemma-3 (1B, 4B, 12B) and Llama-3-8B show that GeoLAN frequently maintains task accuracy while improving geometric metrics and reducing certain fairness biases. These benefits are most significant in mid-sized models. Our findings reveal scale-dependent trade-offs between geometric precision and performance, suggesting that geometry-aware training is a promising approach to enhance mechanistic interpretability.