Apertus LLM Family Expansion via Distillation and Quantization

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the pressing need for efficiently constructing scalable large language model families that accommodate diverse hardware and budget constraints. Building upon the open-source Apertus 8B model, the authors develop the Apertus-v1.1 series—scaling up to 4B parameters—through a synergistic combination of knowledge distillation, quantization, and large-scale pretraining on 1.7 trillion tokens of permissively licensed data. This approach achieves substantial improvements in deployment flexibility and inference efficiency across a wide range of hardware platforms at low cost, while preserving strong task accuracy. The results demonstrate that integrating distillation and quantization constitutes an effective pathway for the efficient scaling of model families without compromising performance.
📝 Abstract
The wide adoption of LLMs has led to their use in great variety of applications and scenarios, such as chatbot assistants and data annotation, creating the need for the models to satisfy certain budget and hardware constraints. This has led to the trend of LLMs being released in batches consisting of similar models of various sizes for the family of models to adhere to as wide of a range of constraints as possible. In this paper, we validate distillation and quantization as a cost-effective way to expand model families to new sizes and hardware formats. Based on the open-recipe Apertus 8B LLM, we produce Apertus-v1.1 - a distilled family of models with up to 4B parameters trained on 1.7T permissive license tokens. We demonstrate cost-efficiency and strong accuracy performance of our approach for covering large ranges of hardware and systems requirements.
Problem

Research questions and friction points this paper is trying to address.

LLM family expansion
hardware constraints
budget constraints
model scalability
resource-efficient deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

distillation
quantization
model family expansion
cost-efficient LLMs
hardware-aware deployment
🔎 Similar Papers
No similar papers found.