BiGSCoder: State Space Model for Code Understanding

๐Ÿ“… 2025-05-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work systematically evaluates the feasibility of replacing Transformers with State Space Models (SSMs) for code understanding tasks. To this end, we propose BiGSCoderโ€”the first bidirectional gated SSM architecture specifically designed for code understanding, pretrained via masked language modeling (MLM). BiGSCoder eliminates explicit positional encoding, inherently supporting long-sequence modeling and length extrapolation while achieving higher sample efficiency. Experiments demonstrate that BiGSCoder outperforms Transformer baselines of comparable size on major code understanding benchmarks (e.g., CodeXGLUE, MultiPL-E), attaining state-of-the-art performance using only one-third of the pretraining data, no positional embeddings, and a simplified pretraining pipeline. Our core contributions are: (i) the first bidirectional SSM framework tailored for code understanding; and (ii) empirical evidence that SSMs exhibit superior long-range dependency modeling, higher data efficiency, and better generalization in programming tasks compared to Transformers.

Technology Category

Application Category

๐Ÿ“ Abstract
We present BiGSCoder, a novel encoder-only bidirectional state-space model (SSM) featuring a gated architecture, pre-trained for code understanding on a code dataset using masked language modeling. Our work aims to systematically evaluate SSMs' capabilities in coding tasks compared to traditional transformer architectures; BiGSCoder is built for this purpose. Through comprehensive experiments across diverse pre-training configurations and code understanding benchmarks, we demonstrate that BiGSCoder outperforms transformer-based models, despite utilizing simpler pre-training strategies and much less training data. Our results indicate that BiGSCoder can serve as a more sample-efficient alternative to conventional transformer models. Furthermore, our study shows that SSMs perform better without positional embeddings and can effectively extrapolate to longer sequences during fine-tuning.
Problem

Research questions and friction points this paper is trying to address.

Evaluating SSMs' capabilities in code tasks vs transformers
Developing BiGSCoder for efficient code understanding
Testing SSMs' performance without positional embeddings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional state-space model with gated architecture
Pre-trained using masked language modeling
Outperforms transformers with less data
๐Ÿ”Ž Similar Papers
No similar papers found.