🤖 AI Summary
Lightweight Transformer encoders for on-device multi-task NLP deployment face optimization conflicts and efficiency bottlenecks due to competing task gradients and stringent resource constraints. Method: We propose a task-guided LoRA-based multi-task pre-finetuning framework built upon a shared lightweight BERT backbone. It employs modular, task-specific LoRA adapters for named entity recognition (NER) and text classification, enabling parameter-efficient sharing while preserving task-specific optimization. A task-aware low-rank update mechanism mitigates gradient interference across tasks. Results: Evaluated on 21 downstream tasks, our method achieves +0.8% average F1 gain for NER and +8.8% accuracy improvement for text classification—matching the performance of standalone single-task fine-tuned models—while satisfying on-device deployment constraints in model size and computational cost. This work establishes a scalable, lightweight paradigm for edge-deployable multi-task NLP that balances generality, accuracy, and efficiency.
📝 Abstract
Deploying natural language processing (NLP) models on mobile platforms requires models that can adapt across diverse applications while remaining efficient in memory and computation. We investigate pre-finetuning strategies to enhance the adaptability of lightweight BERT-like encoders for two fundamental NLP task families: named entity recognition (NER) and text classification. While pre-finetuning improves downstream performance for each task family individually, we find that naïve multi-task pre-finetuning introduces conflicting optimization signals that degrade overall performance. To address this, we propose a simple yet effective multi-task pre-finetuning framework based on task-primary LoRA modules, which enables a single shared encoder backbone with modular adapters. Our approach achieves performance comparable to individual pre-finetuning while meeting practical deployment constraint. Experiments on 21 downstream tasks show average improvements of +0.8% for NER and +8.8% for text classification, demonstrating the effectiveness of our method for versatile mobile NLP applications.