Constituency Parsing using LLMs

πŸ“… 2023-10-30
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses syntactic constituent parsingβ€”a fundamental yet unresolved challenge in NLPβ€”by proposing a novel LLM-driven parsing paradigm that eschews explicit grammar modeling. Methodologically, it introduces three tree-linearization strategies to encode constituent trees as symbolic sequences, enabling pure text generation for parsing via LLMs. The approach is rigorously evaluated across zero-shot, few-shot, and fully supervised settings using diverse models including ChatGPT, GPT-4, OPT, LLaMA, and Alpaca. Key contributions include: (i) the first comprehensive empirical analysis revealing both the generalization bottlenecks and latent capabilities of LLMs for constituent parsing; and (ii) strong cross-domain generalization on multiple benchmarks, with performance approaching that of state-of-the-art specialized parsers in certain configurations. The work provides empirical validation and methodological foundations for lightweight, grammar-agnostic, and broadly applicable constituent parsing.
πŸ“ Abstract
Constituency parsing is a fundamental yet unsolved natural language processing task. In this paper, we explore the potential of recent large language models (LLMs) that have exhibited remarkable performance across various domains and tasks to tackle this task. We employ three linearization strategies to transform output trees into symbol sequences, such that LLMs can solve constituency parsing by generating linearized trees. We conduct experiments using a diverse range of LLMs, including ChatGPT, GPT-4, OPT, LLaMA, and Alpaca, comparing their performance against the state-of-the-art constituency parsers. Our experiments encompass zero-shot, few-shot, and full-training learning settings, and we evaluate the models on one in-domain and five out-of-domain test datasets. Our findings reveal insights into LLMs' performance, generalization abilities, and challenges in constituency parsing.
Problem

Research questions and friction points this paper is trying to address.

LLMs address constituency parsing via sequence generation
Methods ensure valid constituent trees through error learning
Multi-agent collaboration refines outputs to improve parsing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformats parsing as sequence-to-sequence generation
Guides LLMs using learning from erroneous samples
Refines outputs via multi-agent collaboration approach
πŸ”Ž Similar Papers
No similar papers found.