PLaMo 2 Technical Report

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of scarce Japanese training corpora, limited long-context support, and suboptimal computational efficiency in large language model development for Japanese, this work introduces the PLaMo 2 series. Methodologically, it adopts a hybrid Samba architecture, extends context length to 32K via continual pretraining, proposes a novel synthetic Japanese corpus generation technique to alleviate data scarcity, and employs weight reuse coupled with structured pruning for efficient training—enabling an 8B-parameter model to match the performance of hundred-billion-parameter models. Furthermore, it integrates synthetic instruction tuning, DPO-based alignment optimization, vLLM-accelerated inference, and low-loss quantization. Empirically, PLaMo 2 achieves state-of-the-art results among open-source Japanese LMs across multiple benchmarks, demonstrating superior instruction following, linguistic fluency, and domain knowledge acquisition.

Technology Category

Application Category

📝 Abstract
In this report, we introduce PLaMo 2, a series of Japanese-focused large language models featuring a hybrid Samba-based architecture that transitions to full attention via continual pre-training to support 32K token contexts. Training leverages extensive synthetic corpora to overcome data scarcity, while computational efficiency is achieved through weight reuse and structured pruning. This efficient pruning methodology produces an 8B model that achieves performance comparable to our previous 100B model. Post-training further refines the models using a pipeline of supervised fine-tuning (SFT) and direct preference optimization (DPO), enhanced by synthetic Japanese instruction data and model merging techniques. Optimized for inference using vLLM and quantization with minimal accuracy loss, the PLaMo 2 models achieve state-of-the-art results on Japanese benchmarks, outperforming similarly-sized open models in instruction-following, language fluency, and Japanese-specific knowledge.
Problem

Research questions and friction points this paper is trying to address.

Develop Japanese-focused large language models
Overcome data scarcity with synthetic corpora
Achieve efficient performance with pruning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Samba-based architecture with full attention
Synthetic corpora training with weight reuse pruning
SFT and DPO pipeline enhanced by model merging
🔎 Similar Papers
No similar papers found.
K
Kaizaburo Chubachi
Preferred Networks
Yasuhiro Fujita
Yasuhiro Fujita
Preferred Networks, Inc.
reinforcement learningmachine learning
Shinichi Hemmi
Shinichi Hemmi
Preferred Networks Inc.
Y
Yuta Hirokawa
Preferred Networks
T
Toshiki Kataoka
Preferred Networks
Goro Kobayashi
Goro Kobayashi
Preferred Networks
K
Kenichi Maehashi
Preferred Networks
C
Calvin Metzger
Preferred Networks
H
Hiroaki Mikami
Preferred Networks
S
Shogo Murai
Preferred Networks
D
Daisuke Nishino
Preferred Networks
K
Kento Nozawa
Preferred Networks
S
Shintarou Okada
Preferred Networks
Daisuke Okanohara
Daisuke Okanohara
Preferred Networks Inc.
Deep Learning
Shunta Saito
Shunta Saito
Preferred Networks, Inc.
Computer VisionMachine LearningDeep LearningArtificial Intelligence
Shotaro Sano
Shotaro Sano
Preferred Networks
Shuji Suzuki
Shuji Suzuki
Preferred Networks
D
Daisuke Tanaka
Preferred Networks
A
Avinash Ummadisingu
Preferred Networks
H
Hanqin Wang
Preferred Networks
S
Sixue Wang
Preferred Networks
Tianqi Xu
Tianqi Xu
Preferred Networks