BiGSCoder: State Space Model for Code Understanding

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work systematically evaluates the feasibility of replacing Transformers with State Space Models (SSMs) for code understanding tasks. To this end, we propose BiGSCoder—the first bidirectional gated SSM architecture specifically designed for code understanding, pretrained via masked language modeling (MLM). BiGSCoder eliminates explicit positional encoding, inherently supporting long-sequence modeling and length extrapolation while achieving higher sample efficiency. Experiments demonstrate that BiGSCoder outperforms Transformer baselines of comparable size on major code understanding benchmarks (e.g., CodeXGLUE, MultiPL-E), attaining state-of-the-art performance using only one-third of the pretraining data, no positional embeddings, and a simplified pretraining pipeline. Our core contributions are: (i) the first bidirectional SSM framework tailored for code understanding; and (ii) empirical evidence that SSMs exhibit superior long-range dependency modeling, higher data efficiency, and better generalization in programming tasks compared to Transformers.

Technology Category

Application Category

📝 Abstract

We present BiGSCoder, a novel encoder-only bidirectional state-space model (SSM) featuring a gated architecture, pre-trained for code understanding on a code dataset using masked language modeling. Our work aims to systematically evaluate SSMs' capabilities in coding tasks compared to traditional transformer architectures; BiGSCoder is built for this purpose. Through comprehensive experiments across diverse pre-training configurations and code understanding benchmarks, we demonstrate that BiGSCoder outperforms transformer-based models, despite utilizing simpler pre-training strategies and much less training data. Our results indicate that BiGSCoder can serve as a more sample-efficient alternative to conventional transformer models. Furthermore, our study shows that SSMs perform better without positional embeddings and can effectively extrapolate to longer sequences during fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Evaluating SSMs' capabilities in code tasks vs transformers

Developing BiGSCoder for efficient code understanding

Testing SSMs' performance without positional embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional state-space model with gated architecture

Pre-trained using masked language modeling

Outperforms transformers with less data

🔎 Similar Papers

No similar papers found.