Towards Improved Sentence Representations using Token Graphs

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing pooling methods often overlook the structural relationships among tokens in large language models and are highly susceptible to noisy or irrelevant information, leading to degraded sentence representations. To address this, this work proposes GLOT—a lightweight, structure-aware pooling module that reformulates pooling within frozen large language models as a relational learning and aggregation process. GLOT constructs an implicit token similarity graph and leverages graph neural networks combined with a readout layer to efficiently extract robust sentence embeddings. With only a minimal number of trainable parameters—20 times fewer than prior approaches—GLOT achieves state-of-the-art performance on both GLUE and MTEB benchmarks, accelerates training by over 100×, and maintains above 97% accuracy even under extreme conditions with 90% noisy tokens.

Technology Category

Application Category

📝 Abstract

Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, then refines token representations with a graph neural network, and finally aggregates them using a readout layer. Experimentally, our approach is remarkably robust and efficient: on a diagnostic stress test where 90% of tokens are random distractors, GLOT maintains over 97% accuracy while baseline methods collapse. Furthermore, it is competitive with state-of-the-art techniques on benchmarks like GLUE and MTEB with 20x fewer trainable parameters and speeds up the training time by over 100x compared with parameter-efficient fine-tuning methods. Supported by a theoretical analysis of its expressive power, our work shows that learning over token graphs is a powerful paradigm for the efficient adaptation of frozen LLMs. Our code is published at https://github.com/ipsitmantri/GLOT.

Problem

Research questions and friction points this paper is trying to address.

sentence representation

token-level outputs

pooling methods

relational structure

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

token graph

structure-aware pooling

frozen LLM adaptation