Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding

📅 2025-10-07

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses the fundamental trade-off among latency, computational cost, and completion quality in context-aware code completion within IDEs, proposing an industrial-grade solution for the JetBrains platform. Methodologically, we adopt a 4B-parameter Llama architecture pretrained on 4 trillion tokens of multilingual open-source code; develop an end-to-end training pipeline integrating masked-language modeling pretraining, project-level contextual fine-tuning, and Direct Preference Optimization (DPO) guided by real-world user feedback; and introduce a lightweight context compression mechanism alongside a compact model architecture, deeply integrated with the IDE’s contextual packing capabilities. Results demonstrate significant improvements over baselines in both offline benchmarks and online telemetry: P95 latency remains under 300 ms, multi-file project understanding is supported, and the model is publicly released under the Apache-2.0 license—already serving hundreds of thousands of developers.

Technology Category

Application Category

📝 Abstract

We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion. In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.

Problem

Research questions and friction points this paper is trying to address.

Developing contextual code completion for multi-file projects

Optimizing model quality through data curation and training

Meeting latency constraints for interactive IDE integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage training with supervised fine-tuning

Context packing for editor-critical capabilities

Compact model optimized for cost and latency

🔎 Similar Papers

No similar papers found.