Do BERT-Like Bidirectional Models Still Perform Better on Text Classification in the Era of LLMs?

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Amid the widespread adoption of large language models (LLMs), it remains unclear whether LLMs universally outperform traditional encoder-based models—such as BERT—for challenging text classification tasks, particularly under the “LLM omnipotence” assumption. Method: This work systematically benchmarks BERT-style models against LLMs across six high-difficulty text classification tasks. To dissect model capabilities, we propose a task-characteristic–driven trichotomy—pattern-driven, semantic-depth–intensive, and knowledge-dependent—and conduct the first fine-grained capability attribution analysis. Building on this, we introduce TaMAS, a task-aware model adaptive selection framework integrating fine-tuning, zero-shot inference, internal state probing, and PCA-based dimensionality reduction. Contribution/Results: Empirical results show that BERT-style models significantly surpass LLMs on pattern-driven tasks. TaMAS achieves an average accuracy improvement of 2.3%, demonstrating the effectiveness and practicality of task-driven model selection.

Technology Category

Application Category

📝 Abstract
The rapid adoption of LLMs has overshadowed the potential advantages of traditional BERT-like models in text classification. This study challenges the prevailing"LLM-centric"trend by systematically comparing three category methods, i.e., BERT-like models fine-tuning, LLM internal state utilization, and zero-shot inference across six high-difficulty datasets. Our findings reveal that BERT-like models often outperform LLMs. We further categorize datasets into three types, perform PCA and probing experiments, and identify task-specific model strengths: BERT-like models excel in pattern-driven tasks, while LLMs dominate those requiring deep semantics or world knowledge. Based on this, we propose TaMAS, a fine-grained task selection strategy, advocating for a nuanced, task-driven approach over a one-size-fits-all reliance on LLMs.
Problem

Research questions and friction points this paper is trying to address.

Comparing BERT-like models and LLMs for text classification
Identifying task-specific strengths of BERT-like models and LLMs
Proposing TaMAS for task-driven model selection strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT-like models outperform LLMs in classification
Task-specific strengths identified via PCA analysis
TaMAS strategy for optimal model selection