Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work investigates whether bidirectional attention can overcome the inherent limitations of unidirectional attention in autoregressive large language models (LLMs) regarding semantic representation capability. We propose the first progressive bidirectional attention integration method tailored for the Llama architecture, enhancing semantic understanding without compromising generative performance. Our approach employs multi-stage fine-tuning that jointly incorporates bidirectional attention with unsupervised and supervised contrastive learning. We systematically evaluate the resulting model across word embeddings, diagnostic probing tasks, and downstream understanding applications—including text similarity and classification. Experimental results demonstrate substantial improvements in semantic encoding capacity; probing analyses confirm the acquisition of richer, more hierarchical semantic features. Moreover, consistent performance gains are observed across diverse comprehension-oriented benchmarks, underscoring the critical role of attention directionality in representation quality.

Technology Category

Application Category

📝 Abstract

Autoregressive Large Language Models (LLMs) demonstrate exceptional performance in language understanding and generation. However, their application in text embedding tasks has been relatively slow, along with the analysis of their semantic representation in probing tasks, due to the constraints of the unidirectional attention mechanism. This paper aims to explore whether such constraints can be overcome by enabling bidirectional attention in LLMs. We tested different variants of the Llama architecture through additional training steps, progressively enabling bidirectional attention and unsupervised/supervised contrastive learning.

Problem

Research questions and friction points this paper is trying to address.

Overcoming unidirectional attention limitations in LLMs

Analyzing semantic representations through bidirectional attention

Enhancing text embeddings with contrastive learning methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enabled bidirectional attention in autoregressive LLMs

Applied unsupervised and supervised contrastive learning

Tested modified Llama architecture variants progressively

🔎 Similar Papers

Revisiting Word Embeddings in the LLM Era