Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

📅 2025-10-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental trade-off among latency, computational cost, and completion quality in context-aware code completion within IDEs, proposing an industrial-grade solution for the JetBrains platform. Methodologically, we adopt a 4B-parameter Llama architecture pretrained on 4 trillion tokens of multilingual open-source code; develop an end-to-end training pipeline integrating masked-language modeling pretraining, project-level contextual fine-tuning, and Direct Preference Optimization (DPO) guided by real-world user feedback; and introduce a lightweight context compression mechanism alongside a compact model architecture, deeply integrated with the IDE’s contextual packing capabilities. Results demonstrate significant improvements over baselines in both offline benchmarks and online telemetry: P95 latency remains under 300 ms, multi-file project understanding is supported, and the model is publicly released under the Apache-2.0 license—already serving hundreds of thousands of developers.

Technology Category

Application Category

📝 Abstract
We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion. In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.
Problem

Research questions and friction points this paper is trying to address.

Developing contextual code completion for multi-file projects
Optimizing model quality through data curation and training
Meeting latency constraints for interactive IDE integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage training with supervised fine-tuning
Context packing for editor-critical capabilities
Compact model optimized for cost and latency
🔎 Similar Papers
No similar papers found.
N
Nikita Pavlichenko
JetBrains, Berlin, Germany
I
Iurii Nazarov
JetBrains, Munich, Germany
I
Ivan Dolgov
JetBrains, Berlin, Germany
E
Ekaterina Garanina
JetBrains, Yerevan, Armenia
D
Dmitry Ustalov
JetBrains, Belgrade, Serbia
I
Ivan Bondyrev
JetBrains, Amsterdam, The Netherlands
K
Kseniia Lysaniuk
JetBrains, Bremen, Germany
E
Evgeniia Vu
JetBrains, Berlin, Germany
K
Kirill Chekmenev
JetBrains, Amsterdam, The Netherlands
J
Joseph Shtok
JetBrains, Prague, Czech Republic
Yaroslav Golubev
Yaroslav Golubev
JetBrains Research
OSS licensescode changesrefactoringssoftware ecosystemsempirical software engineering
A
Anton Semenkin
JetBrains, Belgrade, Serbia
U
Uladzislau Sazanovich
JetBrains, Munich, Germany