Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of balancing performance and inference efficiency in large language models for code, this paper introduces Ling-Coder-Lite—a lightweight, high-efficiency code LLM. Methodologically, it pioneers a synergistic optimization paradigm integrating program analysis—both static and dynamic—for high-quality data curation with a Mixture-of-Experts (MoE) sparse architecture, complemented by targeted post-training strategies. Experiments demonstrate that Ling-Coder-Lite matches the performance of Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite across 12 mainstream programming benchmarks, while reducing inference latency by 42%, increasing throughput by 1.8×, and cutting deployment resource requirements by 50% at equivalent performance. Both the full model weights and the curated high-quality code dataset are publicly released. The core contribution is a program-semantics-aware joint optimization framework unifying data selection and architectural design, establishing a new paradigm for high-performance code intelligence under resource constraints.

Technology Category

Application Category

📝 Abstract
Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the DeepSeek Coder series. This paper introduces yet another attempt in this area, namely Ling-Coder-Lite. We leverage the efficient Mixture-of-Experts (MoE) architecture along with a set of high-quality data curation methods (especially those based on program analytics) to build an efficient yet powerful code LLM. Ling-Coder-Lite exhibits on-par performance on 12 representative coding benchmarks compared to state-of-the-art models of similar size, such as Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite, while offering competitive latency and throughput. In practice, we achieve a 50% reduction in deployment resources compared to the similar-sized dense model without performance loss. To facilitate further research and development in this area, we open-source our models as well as a substantial portion of high-quality data for the annealing and post-training stages. The models and data can be accessed at~url{https://huggingface.co/inclusionAI/Ling-Coder-lite}.
Problem

Research questions and friction points this paper is trying to address.

Balancing performance and efficiency in code LLMs
Leveraging MoE and high-quality data for better models
Reducing deployment resources without performance loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Mixture-of-Experts (MoE) architecture
Employs high-quality data curation methods
Reduces deployment resources by 50%
🔎 Similar Papers
No similar papers found.
W
Wenting Cai
CodeFuse & Ling Team, Ant Group
Yuchen Cao
Yuchen Cao
Carnegie Mellon University
Spatial ComputingComputer VisionArtificial IntelligenceExtended Reality
C
Chaoyu Chen
CodeFuse & Ling Team, Ant Group
C
Chen Chen
CodeFuse & Ling Team, Ant Group
S
Siba Chen
CodeFuse & Ling Team, Ant Group
Q
Qing Cui
CodeFuse & Ling Team, Ant Group
Peng Di
Peng Di
Senior Staff Engineer at Ant Group; Adjunct Associate Professor at UNSW Sydney
Parallel ComputingProgramming LanguageCompilerSoftware Engineering
J
Junpeng Fang
CodeFuse & Ling Team, Ant Group
Z
Zi Gong
CodeFuse & Ling Team, Ant Group
T
Ting Guo
CodeFuse & Ling Team, Ant Group
Z
Zhengyu He
CodeFuse & Ling Team, Ant Group
Y
Yang Huang
CodeFuse & Ling Team, Ant Group
C
Cong Li
CodeFuse & Ling Team, Ant Group
Jianguo Li
Jianguo Li
Director, Ant Group
deep learningcomputer visionmachine learningsystem
Z
Zheng Li
CodeFuse & Ling Team, Ant Group
S
Shijie Lian
CodeFuse & Ling Team, Ant Group
B
BingChang Liu
CodeFuse & Ling Team, Ant Group
S
Songshan Luo
CodeFuse & Ling Team, Ant Group
S
Shuo Mao
CodeFuse & Ling Team, Ant Group
M
Min Shen
CodeFuse & Ling Team, Ant Group
J
Jian Wu
CodeFuse & Ling Team, Ant Group
Jiaolong Yang
Jiaolong Yang
Microsoft Research
3D Computer Vision
W
Wenjie Yang
CodeFuse & Ling Team, Ant Group
T
Tong Ye
CodeFuse & Ling Team, Ant Group
H
Hang Yu
CodeFuse & Ling Team, Ant Group
W
Wei Zhang
CodeFuse & Ling Team, Ant Group
Z
Zhenduo Zhang
CodeFuse & Ling Team, Ant Group
X
Xunjin Zheng
CodeFuse & Ling Team, Ant Group
J
Jun Zhou
CodeFuse & Ling Team, Ant Group