🤖 AI Summary
To address context ambiguity and local semantic loss induced by complex syntactic structures in machine translation, this paper proposes a K-means-guided Transformer architecture. Our method introduces K-means clustering as a preprocessing module to explicitly model word-sense cluster-level context, thereby driving attention recalibration and overcoming the limitations of conventional positional encoding in capturing semantic structure. It integrates semantic-aware dynamic reweighting of attention scores with context-enhanced token embeddings. Evaluated on the WMT2023 multi-domain test set, our approach achieves an average BLEU improvement of 2.3 points and reduces local semantic consistency errors by 31%. It significantly enhances fine-grained synonym discrimination and long-range dependency modeling—demonstrating superior contextual understanding and structural generalization over baseline Transformers.
📝 Abstract
This paper advances a novel architectural schema anchored upon the Transformer paradigm and innovatively amalgamates the K-means categorization algorithm to augment the contextual apprehension capabilities of the schema. The transformer model performs well in machine translation tasks due to its parallel computing power and multi-head attention mechanism. However, it may encounter contextual ambiguity or ignore local features when dealing with highly complex language structures. To circumvent this constraint, this exposition incorporates the K-Means algorithm, which is used to stratify the lexis and idioms of the input textual matter, thereby facilitating superior identification and preservation of the local structure and contextual intelligence of the language. The advantage of this combination is that K-Means can automatically discover the topic or concept regions in the text, which may be directly related to translation quality. Consequently, the schema contrived herein enlists K-Means as a preparatory phase antecedent to the Transformer and recalibrates the multi-head attention weights to assist in the discrimination of lexis and idioms bearing analogous semantics or functionalities. This ensures the schema accords heightened regard to the contextual intelligence embodied by these clusters during the training phase, rather than merely focusing on locational intelligence.