ModernGBERT: German-only 1B Encoder Model Trained from Scratch

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses German natural language understanding (NLU), text embedding, and long-context reasoning under resource constraints. We propose ModernGBERT—a natively German encoder architecture—trained from scratch in two scales (134M and 1B parameters)—and systematically compare it against encoder variants derived from the German decoder LLM LLäMmlein via LLM2Vec. To our knowledge, this is the first full-scale pretraining of a monolingual German encoder, incorporating ModernBERT architectural enhancements. We introduce the first controllable multi-task evaluation benchmark for German—including GLUE-de, STS-de, and LongBEIR-de—to uniformly assess both native and converted encoders across performance and parameter efficiency. Experiments demonstrate that ModernGBERT-1B surpasses prior state-of-the-art German encoders and all LLäMmlein2Vec variants on multiple NLU and embedding tasks, achieving higher accuracy with superior parameter efficiency. All models, datasets, and code are publicly released.

Technology Category

Application Category

📝 Abstract

Despite the prominence of decoder-only language models, encoders remain crucial for resource-constrained applications. We introduce ModernGBERT (134M, 1B), a fully transparent family of German encoder models trained from scratch, incorporating architectural innovations from ModernBERT. To evaluate the practical trade-offs of training encoders from scratch, we also present LL""aMmlein2Vec (120M, 1B, 7B), a family of encoders derived from German decoder-only models via LLM2Vec. We benchmark all models on natural language understanding, text embedding, and long-context reasoning tasks, enabling a controlled comparison between dedicated encoders and converted decoders. Our results show that ModernGBERT 1B outperforms prior state-of-the-art German encoders as well as encoders adapted via LLM2Vec, with regard to performance and parameter-efficiency. All models, training data, checkpoints and code are publicly available, advancing the German NLP ecosystem with transparent, high-performance encoder models.

Problem

Research questions and friction points this paper is trying to address.

Develops German-only encoder models for resource-constrained applications

Compares dedicated encoders versus converted decoders in German NLP

Evaluates performance and efficiency of ModernGBERT versus prior encoders

Innovation

Methods, ideas, or system contributions that make the work stand out.

German-only encoder models trained from scratch

Incorporates ModernBERT architectural innovations

Publicly available models and training data

🔎 Similar Papers

No similar papers found.