🤖 AI Summary
In multiplayer online gaming scenarios, generic automatic speech recognition (ASR) systems suffer from high error rates due to short utterances, rapid speaking rates, domain-specific terminology, and strong background noise. To address these challenges, this paper proposes GO-AEC—a novel framework integrating large language models (LLMs) with retrieval-augmented generation (RAG) to construct a dynamic game-specific knowledge base. It incorporates an N-best hypothesis re-ranking module and a context-aware error correction mechanism, and innovatively introduces an LLM-driven text-to-speech (TTS) data augmentation strategy. The framework significantly enhances ASR robustness and domain adaptability. Experimental evaluation on a real-world gaming speech test set demonstrates a 6.22 percentage-point reduction in character error rate (CER) and a 29.71% relative decrease in sentence error rate (SER), validating GO-AEC’s effectiveness and state-of-the-art performance for gaming speech understanding.
📝 Abstract
With the rise of multiplayer online games, real-time voice communication is essential for team coordination. However, general ASR systems struggle with gaming-specific challenges like short phrases, rapid speech, jargon, and noise, leading to frequent errors. To address this, we propose the GO-AEC framework, which integrates large language models, Retrieval-Augmented Generation (RAG), and a data augmentation strategy using LLMs and TTS. GO-AEC includes data augmentation, N-best hypothesis-based correction, and a dynamic game knowledge base. Experiments show GO-AEC reduces character error rate by 6.22% and sentence error rate by 29.71%, significantly improving ASR accuracy in gaming scenarios.