InCoder-32B: Code Foundation Model for Industrial Scenarios

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of existing large code models in industrial settings characterized by strong hardware semantics, domain-specific language structures, and stringent resource constraints. To bridge this gap, we propose the first industrial-scale unified code foundation model with 32 billion parameters, spanning critical domains including chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. The model is trained from scratch, integrating general-purpose code pretraining, curated industrial code annealing, progressive long-context expansion from 8K to 128K tokens, and execution-based post-training strategies. Experimental results demonstrate competitive performance across 14 general-purpose benchmarks and establish state-of-the-art open-source baselines on nine industrial benchmarks across four key domains.

Technology Category

Application Category

📝 Abstract
Recent code large language models have achieved remarkable progress on general programming tasks. Nevertheless, their performance degrades significantly in industrial scenarios that require reasoning about hardware semantics, specialized language constructs, and strict resource constraints. To address these challenges, we introduce InCoder-32B (Industrial-Coder-32B), the first 32B-parameter code foundation model unifying code intelligence across chip design, GPU kernel optimization, embedded systems, compiler optimization, and 3D modeling. By adopting an efficient architecture, we train InCoder-32B from scratch with general code pre-training, curated industrial code annealing, mid-training that progressively extends context from 8K to 128K tokens with synthetic industrial reasoning data, and post-training with execution-grounded verification. We conduct extensive evaluation on 14 mainstream general code benchmarks and 9 industrial benchmarks spanning 4 specialized domains. Results show InCoder-32B achieves highly competitive performance on general tasks while establishing strong open-source baselines across industrial domains.
Problem

Research questions and friction points this paper is trying to address.

industrial code
hardware semantics
resource constraints
specialized language constructs
code foundation model
Innovation

Methods, ideas, or system contributions that make the work stand out.

code foundation model
industrial code reasoning
long-context training
execution-grounded verification
domain-specific code intelligence
🔎 Similar Papers
No similar papers found.
J
Jian Yang
Beihang University
W
Wei Zhang
Beihang University
J
Jiajun Wu
Beihang University
J
Junhang Cheng
Beihang University
S
Shawn Guo
IQuest Research
H
Haowen Wang
IQuest Research
W
Weicheng Gu
Beihang University
Yaxin Du
Yaxin Du
Shanghai Jiao Tong University
federated learningLLM agents
J
Joseph Li
ELLIS
F
Fanglin Xu
IQuest Research
Yizhi Li
Yizhi Li
University of Manchester, M-A-P
LLMReasoningPost-trainingComputational Music
L
Lin Jing
IQuest Research
Y
Yuanbo Wang
Beihang University
Y
Yuhan Gao
Beihang University
R
Ruihao Gong
Beihang University
C
Chuan Hao
IQuest Research
R
Ran Tao
IQuest Research
A
Aishan Liu
Beihang University
T
Tuney Zheng
IQuest Research
Ganqu Cui
Ganqu Cui
Shanghai AI Lab
LLM AlignmentReinforcement Learning
Zhoujun Li
Zhoujun Li
Beihang University
Artificial IntelligentNatural Language ProcessingNetwork Security
Mingjie Tang
Mingjie Tang
Purdue University
databasedata miningmachine learningspatial data processing
Chenghua Lin
Chenghua Lin
Professor of Natural Language Processing, University of Manchester
Natural language processingnatural language generationmachine learning
Wayne Xin Zhao
Wayne Xin Zhao
Professor, Renmin University of China
Recommender SystemNatural Language ProcessingLarge Language Model
X
Xianglong Liu
Beihang University
Ming Zhou
Ming Zhou
Researcher; Shanghai AI Laboratory
Multi-Agent LearningReinforcement LearningEmbodied AI
B
Bryan Dai
IQuest Research
W
Weifeng Lv
Beihang University