🤖 AI Summary
This work addresses the fundamental trade-off among latency, computational cost, and completion quality in context-aware code completion within IDEs, proposing an industrial-grade solution for the JetBrains platform. Methodologically, we adopt a 4B-parameter Llama architecture pretrained on 4 trillion tokens of multilingual open-source code; develop an end-to-end training pipeline integrating masked-language modeling pretraining, project-level contextual fine-tuning, and Direct Preference Optimization (DPO) guided by real-world user feedback; and introduce a lightweight context compression mechanism alongside a compact model architecture, deeply integrated with the IDE’s contextual packing capabilities. Results demonstrate significant improvements over baselines in both offline benchmarks and online telemetry: P95 latency remains under 300 ms, multi-file project understanding is supported, and the model is publicly released under the Apache-2.0 license—already serving hundreds of thousands of developers.
📝 Abstract
We present the Mellum models family, open-weight code completion models designed for interactive use in JetBrains IDEs. Mellums have 4B parameters, adopt a Llama-style architecture, and are pre-trained on ~4T tokens of permissively licensed, multi-language code. Our studies show that (i) careful data curation and staged training significantly improve the model's quality, (ii) editor-critical capabilities such as context packing are necessary for high-quality suggestions, and (iii) a compact, task-focused model can meet the cost and latency constraints of interactive completion.
In the paper, we describe an end-to-end industrial pipeline for producing contextualized in-editor completion: disciplined data governance, multi-stage training that includes fill-in-the-middle and project context via supervised fine-tuning, and alignment via direct preference optimization using feedback from real-world scenarios. Our quality evaluations include both large-scale offline benchmarks and online telemetry from production deployments in JetBrains IDEs. Mellums are released under the Apache-2.0 license on HuggingFace, with a public model card providing a reproducible reference for practitioners. Our experience offers a pragmatic blueprint for taking a focused, open model from a research prototype to at scale production for hundreds of thousands of users.