🤖 AI Summary
This study systematically investigates performance disparities between Transformer-based and non-Transformer models in relation classification, with emphasis on contextual modeling capability, few-shot learning efficiency, and robustness to long sequences. Method: We conduct comprehensive experiments across three benchmark datasets—TACRED, TACREV, and RE-TACRED—comparing representative Transformer architectures (BERT, RoBERTa, R-BERT) against prominent non-Transformer models (PA-LSTM, C-GCN, AGGCN). Crucially, we perform the first cross-architectural analysis across multiple dimensions: data scale, sentence length, and annotation density. Contribution/Results: Our results reveal a structural advantage of Transformers in capturing long-range contextual dependencies and generalizing across settings: they achieve micro-F1 scores of 80–90%, substantially outperforming non-Transformer counterparts (64–67%). Gains are especially pronounced under low-resource conditions and for long sentences. These findings provide empirical guidance for model selection and architectural design in relation classification.
📝 Abstract
In the era of large language model, relation extraction (RE) plays an important role in information extraction through the transformation of unstructured raw text into structured data (Wadhwa et al., 2023). In this paper, we systematically compare the performance of deep supervised learning approaches without transformers and those with transformers. We used a series of non-transformer architectures such as PA-LSTM(Zhang et al., 2017), C-GCN(Zhang et al., 2018), and AGGCN(attention guide GCN)(Guo et al., 2019), and a series of transformer architectures such as BERT, RoBERTa, and R-BERT(Wu and He, 2019). Our comparison included traditional metrics like micro F1, as well as evaluations in different scenarios, varying sentence lengths, and different percentages of the dataset for training. Our experiments were conducted on TACRED, TACREV, and RE-TACRED. The results show that transformer-based models outperform non-transformer models, achieving micro F1 scores of 80-90% compared to 64-67% for non-transformer models. Additionally, we briefly review the research journey in supervised relation classification and discuss the role and current status of large language models (LLMs) in relation extraction.