🤖 AI Summary
Large language models (LLMs) exhibit notable deficiencies in foundational NLP tasks requiring deep linguistic understanding—such as syntactic parsing—primarily due to their failure to effectively leverage explicit grammatical rules encoded in treebanks. To address this, we propose a zero-shot, training-free self-correction framework that guides LLMs to iteratively refine syntactic outputs via three core components: automatic grammar error detection, dynamic retrieval of treebank-derived grammatical rules, and context-aware prompt construction. Our approach introduces the first zero-shot self-correction mechanism grounded in multilingual (English and Chinese) treebank grammars, eliminating reliance on fine-tuning or manual prompt engineering. Evaluated on three standard syntactic parsing benchmarks, the method achieves significant accuracy improvements, demonstrates strong cross-domain generalization, and delivers consistent performance gains in both English and Chinese settings.
📝 Abstract
Large language models (LLMs) have achieved remarkable success across various natural language processing (NLP) tasks. However, recent studies suggest that they still face challenges in performing fundamental NLP tasks essential for deep language understanding, particularly syntactic parsing. In this paper, we conduct an in-depth analysis of LLM parsing capabilities, delving into the specific shortcomings of their parsing results. We find that LLMs may stem from limitations to fully leverage grammar rules in existing treebanks, which restricts their capability to generate valid syntactic structures. To help LLMs acquire knowledge without additional training, we propose a self-correction method that leverages grammar rules from existing treebanks to guide LLMs in correcting previous errors. Specifically, we automatically detect potential errors and dynamically search for relevant rules, offering hints and examples to guide LLMs in making corrections themselves. Experimental results on three datasets with various LLMs, demonstrate that our method significantly improves performance in both in-domain and cross-domain settings on the English and Chinese datasets.