Towards a Japanese Full-duplex Spoken Dialogue System

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior to this work, no end-to-end full-duplex spoken dialogue system existed for Japanese, leaving a critical gap in modeling natural conversational phenomena such as speech overlap and backchanneling. Method: We propose the first end-to-end full-duplex Japanese spoken dialogue system, built upon the Moshi architecture. Our approach integrates large-scale Japanese spoken-language pretraining, high-quality stereo dialogue fine-tuning, and a novel two-stage training paradigm augmented with multi-stream TTS-synthesized data. Contribution/Results: We publicly release the first open-source full-duplex Japanese dialogue model. Experiments demonstrate substantial improvements over existing Japanese baselines in both speech naturalness and semantic coherence, effectively bridging the technical gap in Japanese full-duplex conversational modeling.

Technology Category

Application Category

📝 Abstract
Full-duplex spoken dialogue systems, which can model simultaneous bidirectional features of human conversations such as speech overlaps and backchannels, have attracted significant attention recently. However, the study of full-duplex spoken dialogue systems for the Japanese language has been limited, and the research on their development in Japanese remains scarce. In this paper, we present the first publicly available full-duplex spoken dialogue model in Japanese, which is built upon Moshi, a full-duplex dialogue model in English. Our model is trained through a two-stage process: pre-training on a large-scale spoken dialogue data in Japanese, followed by fine-tuning on high-quality stereo spoken dialogue data. We further enhance the model's performance by incorporating synthetic dialogue data generated by a multi-stream text-to-speech system. Evaluation experiments demonstrate that the trained model outperforms Japanese baseline models in both naturalness and meaningfulness.
Problem

Research questions and friction points this paper is trying to address.

Developing first Japanese full-duplex spoken dialogue system
Addressing limited research on Japanese full-duplex conversation modeling
Improving naturalness and meaningfulness in Japanese dialogue systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Japanese full-duplex dialogue model
Two-stage training with pre-training and fine-tuning
Enhanced by synthetic multi-stream TTS data
🔎 Similar Papers
No similar papers found.
Atsumoto Ohashi
Atsumoto Ohashi
Nagoya University
Dialogue SystemsNatural Language Processing
S
Shinya Iizuka
Graduate School of Informatics, Nagoya University, Japan
J
Jingjing Jiang
Graduate School of Informatics, Nagoya University, Japan
Ryuichiro Higashinaka
Ryuichiro Higashinaka
Nagoya University
Dialogue SystemsSpoken Dialogue SystemsQuestion Answering