Avey-B

📅 2026-02-17

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of constructing efficient bidirectional encoders for resource-constrained industrial settings by reconfiguring the attention-free Avey model into a pure encoder architecture. It introduces three key innovations: decoupling static and dynamic parameters, a stability-oriented normalization strategy, and neural compression techniques. The proposed approach achieves high-performance bidirectional contextual modeling without attention mechanisms for the first time, consistently outperforming four mainstream Transformer-based encoders on standard token classification and information retrieval benchmarks. Furthermore, it demonstrates superior scaling efficiency and computational efficacy in long-context tasks, offering a compelling alternative to conventional attention-based architectures in scenarios where computational resources are limited.

Technology Category

Application Category

📝 Abstract

Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts.

Problem

Research questions and friction points this paper is trying to address.

compact pretrained encoders

industrial NLP

compute and memory budgets

long-context modeling

bidirectional contextualization

Innovation

Methods, ideas, or system contributions that make the work stand out.

attention-free

encoder-only

neural compression