Constituency Parsing using LLMs

📅 2023-10-30

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This paper addresses syntactic constituent parsing—a fundamental yet unresolved challenge in NLP—by proposing a novel LLM-driven parsing paradigm that eschews explicit grammar modeling. Methodologically, it introduces three tree-linearization strategies to encode constituent trees as symbolic sequences, enabling pure text generation for parsing via LLMs. The approach is rigorously evaluated across zero-shot, few-shot, and fully supervised settings using diverse models including ChatGPT, GPT-4, OPT, LLaMA, and Alpaca. Key contributions include: (i) the first comprehensive empirical analysis revealing both the generalization bottlenecks and latent capabilities of LLMs for constituent parsing; and (ii) strong cross-domain generalization on multiple benchmarks, with performance approaching that of state-of-the-art specialized parsers in certain configurations. The work provides empirical validation and methodological foundations for lightweight, grammar-agnostic, and broadly applicable constituent parsing.

📝 Abstract

Constituency parsing is a fundamental yet unsolved natural language processing task. In this paper, we explore the potential of recent large language models (LLMs) that have exhibited remarkable performance across various domains and tasks to tackle this task. We employ three linearization strategies to transform output trees into symbol sequences, such that LLMs can solve constituency parsing by generating linearized trees. We conduct experiments using a diverse range of LLMs, including ChatGPT, GPT-4, OPT, LLaMA, and Alpaca, comparing their performance against the state-of-the-art constituency parsers. Our experiments encompass zero-shot, few-shot, and full-training learning settings, and we evaluate the models on one in-domain and five out-of-domain test datasets. Our findings reveal insights into LLMs' performance, generalization abilities, and challenges in constituency parsing.

Problem

Research questions and friction points this paper is trying to address.

LLMs address constituency parsing via sequence generation

Methods ensure valid constituent trees through error learning

Multi-agent collaboration refines outputs to improve parsing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformats parsing as sequence-to-sequence generation

Guides LLMs using learning from erroneous samples

Refines outputs via multi-agent collaboration approach

🔎 Similar Papers

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints