๐ค AI Summary
This work systematically evaluates the feasibility of replacing Transformers with State Space Models (SSMs) for code understanding tasks. To this end, we propose BiGSCoderโthe first bidirectional gated SSM architecture specifically designed for code understanding, pretrained via masked language modeling (MLM). BiGSCoder eliminates explicit positional encoding, inherently supporting long-sequence modeling and length extrapolation while achieving higher sample efficiency. Experiments demonstrate that BiGSCoder outperforms Transformer baselines of comparable size on major code understanding benchmarks (e.g., CodeXGLUE, MultiPL-E), attaining state-of-the-art performance using only one-third of the pretraining data, no positional embeddings, and a simplified pretraining pipeline. Our core contributions are: (i) the first bidirectional SSM framework tailored for code understanding; and (ii) empirical evidence that SSMs exhibit superior long-range dependency modeling, higher data efficiency, and better generalization in programming tasks compared to Transformers.
๐ Abstract
We present BiGSCoder, a novel encoder-only bidirectional state-space model (SSM) featuring a gated architecture, pre-trained for code understanding on a code dataset using masked language modeling. Our work aims to systematically evaluate SSMs' capabilities in coding tasks compared to traditional transformer architectures; BiGSCoder is built for this purpose. Through comprehensive experiments across diverse pre-training configurations and code understanding benchmarks, we demonstrate that BiGSCoder outperforms transformer-based models, despite utilizing simpler pre-training strategies and much less training data. Our results indicate that BiGSCoder can serve as a more sample-efficient alternative to conventional transformer models. Furthermore, our study shows that SSMs perform better without positional embeddings and can effectively extrapolate to longer sequences during fine-tuning.