OpusLM: A Family of Open Unified Speech Language Models

📅 2025-06-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited openness and poor multi-task compatibility of existing speech-language models, this paper introduces the open and unified OpusLM model family. Methodologically, OpusLM initializes with a decoder-only large language model and incorporates speech-text joint tokenization, a multi-stream input architecture, and a staged progressive training strategy, trained uniformly on 213K hours of speech-text pairs and 292B text tokens. Our key contributions are threefold: (1) the first fully open-source, end-to-end reproducible framework unifying automatic speech recognition, text-to-speech synthesis, and pure text understanding; (2) state-of-the-art performance across multiple benchmarks—matching or surpassing leading closed-source models; and (3) complete public release of all code, data, model checkpoints, and training logs, significantly advancing open research in speech-language modeling.

Technology Category

Application Category

📝 Abstract
This paper presents Open Unified Speech Language Models (OpusLMs), a family of open foundational speech language models (SpeechLMs) up to 7B. Initialized from decoder-only text language models, the OpusLMs are continuously pre-trained on 213K hours of speech-text pairs and 292B text-only tokens. We demonstrate our OpusLMs achieve comparable (or even superior) performance with existing SpeechLMs in speech recognition, speech synthesis, and text-only capabilities. Technically, this paper articulates our SpeechLM designs on tokenization, multi-stream language models, and multi-stage training strategies. We experimentally demonstrate the importance of model size scaling and the effect of annealing data selection. The OpusLMs are all built from publicly available materials and are fully transparent models. We release our code, data, checkpoints, and training logs to facilitate open SpeechLM research
Problem

Research questions and friction points this paper is trying to address.

Develops open unified speech-language models (OpusLMs) for diverse tasks
Explores scaling and data selection for SpeechLM performance enhancement
Provides transparent models with released resources for open research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Initializes from decoder-only text models
Uses multi-stream language model designs
Implements multi-stage training strategies
🔎 Similar Papers
No similar papers found.