Mixture of Small and Large Models for Chinese Spelling Check

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

In Chinese spelling correction (CSC), existing large language model (LLM)-based approaches suffer from insufficient correction accuracy, while fine-tuned BERT-style models tend to overfit editing patterns and exhibit poor generalization. To address this, we propose a fine-tuning-free dynamic probabilistic ensemble decoding strategy: during beam search, we jointly integrate the output distributions of a lightweight, task-specific BERT variant—ensuring high correction precision—and a frozen, pre-trained LLM—ensuring natural language fluency—via a learnable, context-aware weighting mechanism. This synergistic fusion effectively mitigates overfitting to local editing patterns in small models, significantly enhancing cross-domain adaptability and correction robustness. Our method achieves state-of-the-art performance on benchmark datasets including SIGHAN, with substantial improvements in both error detection and correction accuracy. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

In the era of large language models (LLMs), the Chinese Spelling Check (CSC) task has seen various LLM methods developed, yet their performance remains unsatisfactory. In contrast, fine-tuned BERT-based models, relying on high-quality in-domain data, show excellent performance but suffer from edit pattern overfitting. This paper proposes a novel dynamic mixture approach that effectively combines the probability distributions of small models and LLMs during the beam search decoding phase, achieving a balanced enhancement of precise corrections from small models and the fluency of LLMs. This approach also eliminates the need for fine-tuning LLMs, saving significant time and resources, and facilitating domain adaptation. Comprehensive experiments demonstrate that our mixture approach significantly boosts error correction capabilities, achieving state-of-the-art results across multiple datasets. Our code is available at https://github.com/zhqiao-nlp/MSLLM.

Problem

Research questions and friction points this paper is trying to address.

Improves Chinese spelling check by combining small and large models

Addresses overfitting in BERT models and poor LLM performance

Enhances correction accuracy and fluency without fine-tuning LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic mixture of small and large models

Combines probability distributions during decoding

No fine-tuning needed for large models

🔎 Similar Papers

No similar papers found.