🤖 AI Summary
To address the long-range dependency decay and computational overhead inherent in conventional position encodings for Transformers, this paper proposes Complex Position Encoding (CoPE). CoPE jointly models token semantics and positional information within a complex-valued embedding space: the real part encodes content, while the imaginary part encodes position; a phase-aware attention mechanism is further introduced to explicitly capture positional dependency patterns. Crucially, CoPE natively supports linear attention, circumventing the sequence-length extrapolation bottleneck caused by explicit position encoding injection. On the GLUE benchmark, CoPE consistently outperforms RoPE, sinusoidal encoding, and learned position encodings—achieving superior accuracy while maintaining lower computational complexity. These results empirically validate the effectiveness and efficiency of unifying content and position modeling in the complex domain.
📝 Abstract
Recent studies have demonstrated the effectiveness of position encoding in transformer architectures. By incorporating positional information, this approach provides essential guidance for modeling dependencies between elements across different sequence positions. We introduce CoPE (a lightweight Complex Positional Encoding), a novel architecture that leverages complex-valued encoding to encode both content and positional information. Our approach replaces traditional positional encodings with complex embeddings where the real part captures semantic content and the imaginary part encodes positional information. We introduce phase-aware attention in the first layer of the transformer model to capture position-dependent patterns, followed by standard attention layers for higher-levels. We show that CoPE doesn't exhibit long term decay and is compatible with linear attention. Experimental evaluation on the GLUE benchmark suggest that our approach achieves superior performance with less computational complexity, compared to RoPE, Sinusoidal and Learned positional encodings.