PLaMo 2 Technical Report

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the challenges of scarce Japanese training corpora, limited long-context support, and suboptimal computational efficiency in large language model development for Japanese, this work introduces the PLaMo 2 series. Methodologically, it adopts a hybrid Samba architecture, extends context length to 32K via continual pretraining, proposes a novel synthetic Japanese corpus generation technique to alleviate data scarcity, and employs weight reuse coupled with structured pruning for efficient training—enabling an 8B-parameter model to match the performance of hundred-billion-parameter models. Furthermore, it integrates synthetic instruction tuning, DPO-based alignment optimization, vLLM-accelerated inference, and low-loss quantization. Empirically, PLaMo 2 achieves state-of-the-art results among open-source Japanese LMs across multiple benchmarks, demonstrating superior instruction following, linguistic fluency, and domain knowledge acquisition.

Technology Category

Application Category

📝 Abstract

In this report, we introduce PLaMo 2, a series of Japanese-focused large language models featuring a hybrid Samba-based architecture that transitions to full attention via continual pre-training to support 32K token contexts. Training leverages extensive synthetic corpora to overcome data scarcity, while computational efficiency is achieved through weight reuse and structured pruning. This efficient pruning methodology produces an 8B model that achieves performance comparable to our previous 100B model. Post-training further refines the models using a pipeline of supervised fine-tuning (SFT) and direct preference optimization (DPO), enhanced by synthetic Japanese instruction data and model merging techniques. Optimized for inference using vLLM and quantization with minimal accuracy loss, the PLaMo 2 models achieve state-of-the-art results on Japanese benchmarks, outperforming similarly-sized open models in instruction-following, language fluency, and Japanese-specific knowledge.

Problem

Research questions and friction points this paper is trying to address.

Develop Japanese-focused large language models

Overcome data scarcity with synthetic corpora

Achieve efficient performance with pruning techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Samba-based architecture with full attention

Synthetic corpora training with weight reuse pruning

SFT and DPO pipeline enhanced by model merging

🔎 Similar Papers

No similar papers found.