π€ AI Summary
Most existing English treebanks adopt phrase-structure grammars, limiting their utility for theory-driven syntactic modeling. To address this, we introduce CGELBankβthe first fine-grained, fully formalized syntactic treebank grounded in *The Cambridge Grammar of the English Language* (CGEL). Our approach features a theoretically consistent annotation schema with explicit functional hierarchies and constructional compatibility, transcending conventional category-based labeling; a dedicated annotation toolchain integrated with automated consistency verification; and a publicly released v1.1 annotation manual ensuring reproducibility and interpretability. CGELBank constitutes the first high-fidelity, computationally tractable CGEL-aligned resource, enabling rigorous integration of linguistic theory into NLP. It significantly enhances the theoretical interpretability and structural generalization capacity of syntactic models.
π Abstract
CGELBank is a treebank and associated tools based on a syntactic formalism for English derived from the Cambridge Grammar of the English Language. This document lays out the particularities of the CGELBank annotation scheme.