Learning to Look at the Other Side: A Semantic Probing Study of Word Embeddings in LLMs with Enabled Bidirectional Attention

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether bidirectional attention can overcome the inherent limitations of unidirectional attention in autoregressive large language models (LLMs) regarding semantic representation capability. We propose the first progressive bidirectional attention integration method tailored for the Llama architecture, enhancing semantic understanding without compromising generative performance. Our approach employs multi-stage fine-tuning that jointly incorporates bidirectional attention with unsupervised and supervised contrastive learning. We systematically evaluate the resulting model across word embeddings, diagnostic probing tasks, and downstream understanding applications—including text similarity and classification. Experimental results demonstrate substantial improvements in semantic encoding capacity; probing analyses confirm the acquisition of richer, more hierarchical semantic features. Moreover, consistent performance gains are observed across diverse comprehension-oriented benchmarks, underscoring the critical role of attention directionality in representation quality.

Technology Category

Application Category

📝 Abstract
Autoregressive Large Language Models (LLMs) demonstrate exceptional performance in language understanding and generation. However, their application in text embedding tasks has been relatively slow, along with the analysis of their semantic representation in probing tasks, due to the constraints of the unidirectional attention mechanism. This paper aims to explore whether such constraints can be overcome by enabling bidirectional attention in LLMs. We tested different variants of the Llama architecture through additional training steps, progressively enabling bidirectional attention and unsupervised/supervised contrastive learning.
Problem

Research questions and friction points this paper is trying to address.

Overcoming unidirectional attention limitations in LLMs
Analyzing semantic representations through bidirectional attention
Enhancing text embeddings with contrastive learning methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enabled bidirectional attention in autoregressive LLMs
Applied unsupervised and supervised contrastive learning
Tested modified Llama architecture variants progressively
🔎 Similar Papers
No similar papers found.
Zhaoxin Feng
Zhaoxin Feng
PhD student, Hong Kong Polytechnic University
InterpretabilityComputational Linguistics
J
Jianfei Ma
Language Science and Technology, The Hong Kong Polytechnic University
Emmanuele Chersoni
Emmanuele Chersoni
Hong Kong Polytechnic University
Computational Linguistics
X
Xiaojing Zhao
Language Science and Technology, The Hong Kong Polytechnic University
X
Xiaoyi Bao
Language Science and Technology, The Hong Kong Polytechnic University