π€ AI Summary
Complex multi-stroke Chinese characters (e.g., Hanzi and Kanji) impose substantial cognitive load and recognition difficulty on non-native learners. To address this, we propose a data-driven character simplification framework that quantifies the contribution of each stroke to character recognizability using a high-accuracy deep learning recognition modelβmarking the first such stroke-level importance estimation grounded in empirical recognition performance. Our method iteratively removes strokes while evaluating readability, automatically identifying and eliminating redundant strokes without compromising classification accuracy. Evaluated on 1,256 character classes, the approach achieves effective simplification: most characters retain high discriminability after removing 3β5 strokes. This work establishes the first scalable, interpretable, recognition-aware paradigm for stroke importance modeling and systematic simplification of complex logographic scripts. It offers practical implications for second-language pedagogy, font design, and OCR-friendly text generation.
π Abstract
Multi-stroke characters in scripts such as Chinese and Japanese can be highly complex, posing significant challenges for both native speakers and, especially, non-native learners. If these characters can be simplified without degrading their legibility, it could reduce learning barriers for non-native speakers, facilitate simpler and legible font designs, and contribute to efficient character-based communication systems. In this paper, we propose a framework to systematically simplify multi-stroke characters by selectively removing strokes while preserving their overall legibility. More specifically, we use a highly accurate character recognition model to assess legibility and remove those strokes that minimally impact it. Experimental results on 1,256 character classes with 5, 10, 15, and 20 strokes reveal several key findings, including the observation that even after removing multiple strokes, many characters remain distinguishable. These findings suggest the potential for more formalized simplification strategies.