Zero-Shot Chinese Character Recognition via Global-Local Dual-Branch Alignment and Hierarchical Inference

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the challenge of zero-shot recognition of unseen Chinese characters in open-world scenarios, where existing methods suffer from reliance on global representations that overlook fine-grained local component differences and incur high computational costs with sensitivity to noise. To overcome these limitations, the authors propose a Global-Local Hierarchical Perception Network (GL-HPN) that jointly models global semantics and local structures of character images and glyph descriptions within a unified cross-modal alignment framework. The approach introduces a dual-branch alignment mechanism and a structural filtering mask to suppress interference from non-visual operators. Furthermore, a parameter-free posterior score fusion strategy and a coarse-to-fine hierarchical inference scheme are designed, achieving state-of-the-art performance across multiple zero-shot settings—particularly enhancing accuracy under low-resource conditions while significantly reducing retrieval overhead for large candidate sets.
📝 Abstract
Chinese character categories are extremely large, and unseen characters frequently arise in open-world scenarios, making zero-shot Chinese character recognition an important yet challenging problem. Existing IDS-based retrieval methods usually encode a character image and its ideographic description sequence into a single global vector for matching. Although efficient, such holistic alignment often under-models local component differences. Moreover, directly introducing patch-token level fine-grained interaction suffers from both the noise of structural operators in IDS and the high cost of full-candidate retrieval.To address these issues, we propose a Global-Local Hierarchical Perception Network (GL-HPN), which jointly learns global and local representations of character images and IDS sequences within a unified cross-modal alignment framework. The global branch supports efficient coarse recall, while the local branch improves component-level discrimination through patch-token interaction. We further introduce a structure filtering mask to suppress structurally meaningful but visually non-entity IDS operators in local similarity aggregation. On top of this, we design a coarse-to-fine hierarchical inference strategy that performs global retrieval over the full candidate set and local reranking only on Top-$K$ candidates, followed by parameter-free multiplicative fusion of normalized posterior scores. Experimental results show that GL-HPN achieves competitive performance across multiple zero-shot splits, performs especially well under low-resource settings, and substantially reduces the inference cost of large-scale candidate retrieval.
Problem

Research questions and friction points this paper is trying to address.

Zero-Shot Chinese Character Recognition
Open-World Scenarios
Unseen Characters
Ideographic Description Sequence
Cross-Modal Alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot character recognition
global-local alignment
ideographic description sequence
hierarchical inference
cross-modal representation
🔎 Similar Papers
No similar papers found.