Parsing Through Boundaries in Chinese Word Segmentation

📅 2025-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Chinese word segmentation lacks explicit word boundaries, introducing inherent ambiguity that profoundly impacts dependency parsing. This paper systematically investigates how alternative word boundary definitions affect dependency structures, using the Chinese GSD treebank. It establishes, for the first time, interpretable correlations between segmentation boundaries and both dependency arc distributions and syntactic depth. Through controlled multi-scheme experiments, we quantitatively demonstrate that boundary definitions significantly alter dependency relation distributions and tree complexity (e.g., average dependency length and maximum depth). To support fine-grained linguistic analysis, we develop an interactive visualization tool (built with D3.js and React) enabling real-time comparison of structural differences across segmentation schemes and facilitating linguistic attribution. Our findings provide theoretical grounding and empirical evidence for joint segmentation–parsing modeling, advancing scientifically rigorous evaluation of word-unit selection in Chinese NLP.

Technology Category

Application Category

📝 Abstract
Chinese word segmentation is a foundational task in natural language processing (NLP), with far-reaching effects on syntactic analysis. Unlike alphabetic languages like English, Chinese lacks explicit word boundaries, making segmentation both necessary and inherently ambiguous. This study highlights the intricate relationship between word segmentation and syntactic parsing, providing a clearer understanding of how different segmentation strategies shape dependency structures in Chinese. Focusing on the Chinese GSD treebank, we analyze multiple word boundary schemes, each reflecting distinct linguistic and computational assumptions, and examine how they influence the resulting syntactic structures. To support detailed comparison, we introduce an interactive web-based visualization tool that displays parsing outcomes across segmentation methods.
Problem

Research questions and friction points this paper is trying to address.

Explores Chinese word segmentation's impact on syntactic parsing
Analyzes boundary schemes affecting dependency structures in Chinese
Develops visualization tool to compare segmentation method outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes multiple Chinese word boundary schemes
Examines impact on syntactic dependency structures
Introduces interactive visualization tool for comparisons
🔎 Similar Papers
2024-06-21arXiv.orgCitations: 0
Yige Chen
Yige Chen
College of Computer Science and Artificial Intelligence, Wenzhou University
Networking
Zelong Li
Zelong Li
Rutgers University
Automated Machine LearningRecommendation SystemReinforcement LearningExplainable AI
Changbing Yang
Changbing Yang
University of British Columbia
C
Cindy Zhang
The University of British Columbia, Canada
A
Amandisa Cady
The University of British Columbia, Canada
A
Ai Ka Lee
The University of British Columbia, Canada
Z
Zejiao Zeng
The University of British Columbia, Canada
H
Haihua Pan
The Chinese University of Hong Kong, Hong Kong
J
Jungyeul Park
The University of British Columbia, Canada