CoPE: A Lightweight Complex Positional Encoding

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address the long-range dependency decay and computational overhead inherent in conventional position encodings for Transformers, this paper proposes Complex Position Encoding (CoPE). CoPE jointly models token semantics and positional information within a complex-valued embedding space: the real part encodes content, while the imaginary part encodes position; a phase-aware attention mechanism is further introduced to explicitly capture positional dependency patterns. Crucially, CoPE natively supports linear attention, circumventing the sequence-length extrapolation bottleneck caused by explicit position encoding injection. On the GLUE benchmark, CoPE consistently outperforms RoPE, sinusoidal encoding, and learned position encodings—achieving superior accuracy while maintaining lower computational complexity. These results empirically validate the effectiveness and efficiency of unifying content and position modeling in the complex domain.

Technology Category

Application Category

📝 Abstract

Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across different sequence positions. We introduce CoPE (a lightweight Complex Positional Encoding), a novel architecture that leverages complex-valued encoding to encode both content and positional information. Our approach replaces traditional positional encodings with complex embeddings where the real part captures semantic content and the imaginary part encodes positional information. We introduce phase-aware attention in the first layer of the transformer model to capture position-dependent patterns, followed by standard attention layers for higher-levels. We show that CoPE doesn't exhibit long term decay and is compatible with linear attention. Experimental evaluation on the GLUE benchmark suggest that our approach achieves superior performance with less computational complexity, compared to RoPE, Sinusoidal and Learned positional encodings.

Problem

Research questions and friction points this paper is trying to address.

Introduces lightweight complex positional encoding for transformers

Replaces traditional encodings with complex-valued embeddings

Addresses long-term decay and computational complexity issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Complex embeddings with real and imaginary parts

Phase-aware attention in first transformer layer

Linear attention compatibility without long-term decay

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs