Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

📅 2024-12-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically investigates the joint effects of learning rate, batch size, and training epochs on accuracy, F1-score, and loss in fine-tuning DistilBERT (distilbert-base-uncased-finetuned-sst-2-english) for the SST-2 sentiment classification task. Methodologically, we employ polynomial regression to quantify both main and interaction effects among hyperparameters, complemented by statistical significance testing (p < 0.05). Results reveal that: (i) high learning rate significantly reduces loss (p = 0.027) but degrades accuracy; (ii) batch size and epochs exhibit a strong interaction effect that significantly improves F1-score (p = 0.001); and (iii) batch size significantly affects accuracy (p = 0.028) and F1-score (p = 0.005), yet shows no significant effect on loss (p = 0.170). These findings motivate an adaptive fine-tuning paradigm explicitly modeling nonlinear hyperparameter interactions, providing empirical evidence and methodological guidance for efficient fine-tuning of BERT-family models.

Technology Category

Application Category

📝 Abstract
This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization (p=0.170). The interaction between epochs and batch size maximizes F1-score (p=0.001), underscoring the importance of hyperparameter interplay. These findings highlight the need for fine-tuning strategies addressing non-linear hyperparameter interactions to balance performance across metrics. Such variability and metric trade-offs are relevant for tasks beyond text classification, including NLP and computer vision. This analysis informs fine-tuning strategies for large language models and promotes adaptive designs for broader model applicability.
Problem

Research questions and friction points this paper is trying to address.

DistilBERT Optimization
Text Classification
Parameter Tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

DistilBERT Optimization
Parameter Tuning
Performance Metrics
🔎 Similar Papers
No similar papers found.
G
Giuliano Lorenzoni
University of Waterloo, Waterloo, Ontario, Canada
I
Ivens Portugal
University of Waterloo, Waterloo, Ontario, Canada
Paulo Alencar
Paulo Alencar
Associate Director, CSG; Research Professor, University of Waterloo
software engineeringformal methodsweb engineeringmobile applicationscontext-aware computing
D
Donald Cowan
University of Waterloo, Waterloo, Ontario, Canada