Unsupervised Chunking with Hierarchical RNN

📅 2023-09-10

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

145K/year

🤖 AI Summary

Unsupervised syntactic chunking remains challenging due to the absence of explicit structural annotations. Method: We propose an end-to-end hierarchical RNN (HRNN)-based approach that models the non-hierarchical compositional structure word → chunk → sentence via a two-stage paradigm: (1) unsupervised pretraining using neural parsing to implicitly learn chunking structure, followed by (2) joint fine-tuning on downstream tasks (e.g., CoNLL-2000 chunking). Contribution/Results: To our knowledge, this is the first work demonstrating spontaneous emergence of chunking structure in HRNNs under purely unsupervised settings; we further identify its transient emergence property—offering novel theoretical insights into unsupervised grammar induction. Experiments show our method significantly outperforms prior unsupervised approaches on CoNLL-2000, achieving a +6.0-point improvement in phrase-level F1. Subsequent fine-tuning yields further gains, empirically validating the synergistic interplay between emergent structural learning and task-specific adaptation.

📝 Abstract

In Natural Language Processing (NLP), predicting linguistic structures, such as parsing and chunking, has mostly relied on manual annotations of syntactic structures. This paper introduces an unsupervised approach to chunking, a syntactic task that involves grouping words in a non-hierarchical manner. We present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to model word-to-chunk and chunk-to-sentence compositions. Our approach involves a two-stage training process: pretraining with an unsupervised parser and finetuning on downstream NLP tasks. Experiments on the CoNLL-2000 dataset reveal a notable improvement over existing unsupervised methods, enhancing phrase F1 score by up to 6 percentage points. Further, finetuning with downstream tasks results in an additional performance improvement. Interestingly, we observe that the emergence of the chunking structure is transient during the neural model's downstream-task training. This study contributes to the advancement of unsupervised syntactic structure discovery and opens avenues for further research in linguistic theory.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised chunking without manual syntactic annotations

Hierarchical RNN models word-to-chunk and chunk-to-sentence compositions

Transient emergence of chunking structures during downstream training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised chunking with Hierarchical Recurrent Neural Network

Two-stage training: pretraining parser and finetuning tasks

Transient emergence of chunking structures during training

🔎 Similar Papers

Efficient Length-Generalizable Attention via Causal Retrieval for Long-Context Language Modeling