Towards Improved Sentence Representations using Token Graphs

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing pooling methods often overlook the structural relationships among tokens in large language models and are highly susceptible to noisy or irrelevant information, leading to degraded sentence representations. To address this, this work proposes GLOT—a lightweight, structure-aware pooling module that reformulates pooling within frozen large language models as a relational learning and aggregation process. GLOT constructs an implicit token similarity graph and leverages graph neural networks combined with a readout layer to efficiently extract robust sentence embeddings. With only a minimal number of trainable parameters—20 times fewer than prior approaches—GLOT achieves state-of-the-art performance on both GLUE and MTEB benchmarks, accelerates training by over 100×, and maintains above 97% accuracy even under extreme conditions with 90% noisy tokens.

Technology Category

Application Category

📝 Abstract
Obtaining a single-vector representation from a Large Language Model's (LLM) token-level outputs is a critical step for nearly all sentence-level tasks. However, standard pooling methods like mean or max aggregation treat tokens as an independent set, discarding the rich relational structure captured by the model's self-attention layers and making them susceptible to signal dilution. To address this, we introduce GLOT, a lightweight, structure-aware pooling module that reframes pooling as relational learning followed by aggregation. Operating on the outputs of a frozen LLM, GLOT first constructs a latent token-similarity graph, then refines token representations with a graph neural network, and finally aggregates them using a readout layer. Experimentally, our approach is remarkably robust and efficient: on a diagnostic stress test where 90% of tokens are random distractors, GLOT maintains over 97% accuracy while baseline methods collapse. Furthermore, it is competitive with state-of-the-art techniques on benchmarks like GLUE and MTEB with 20x fewer trainable parameters and speeds up the training time by over 100x compared with parameter-efficient fine-tuning methods. Supported by a theoretical analysis of its expressive power, our work shows that learning over token graphs is a powerful paradigm for the efficient adaptation of frozen LLMs. Our code is published at https://github.com/ipsitmantri/GLOT.
Problem

Research questions and friction points this paper is trying to address.

sentence representation
token-level outputs
pooling methods
relational structure
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

token graph
structure-aware pooling
frozen LLM adaptation
graph neural network
sentence representation
🔎 Similar Papers
No similar papers found.