🤖 AI Summary
For semantic classification tasks in industrial settings—such as customer intent detection and semantic role labeling—that demand high domain expertise and low-latency inference, this paper proposes a parameter-free, token-driven sparse fine-tuning method. The approach uniquely integrates sensitive token identification with selective parameter updating: it automatically discovers task-specific critical tokens from training data and dynamically identifies and updates only the most relevant subset of model parameters. Evaluated on five real-world semantic classification benchmarks, the method consistently outperforms end-to-end fine-tuning, LoRA, layer-wise selection, and prefix tuning—achieving an average +2.1% improvement in classification accuracy, reducing training cost by 50%, enhancing training stability, and preserving the lightweight advantage of small-scale models.
📝 Abstract
Semantic text classification requires the understanding of the contextual significance of specific tokens rather than surface-level patterns or keywords (as in rule-based or statistical text classification), making large language models (LLMs) well-suited for this task. However, semantic classification applications in industry, like customer intent detection or semantic role labeling, tend to be highly specialized. They require annotation by domain experts in contrast to general-purpose corpora for pretraining. Further, they typically require high inference throughputs which limits the model size from latency and cost perspectives. Thus, for a range of specialized classification tasks, the preferred solution is to develop customized classifiers by finetuning smaller language models (e.g., mini-encoders, small language models).
In this work, we develop a token-driven sparse finetuning strategy to adapt small language models to specialized classification tasks. We identify and finetune a small sensitive subset of model parameters by leveraging task-specific token constructs in the finetuning dataset, while leaving most of the pretrained weights unchanged. Unlike adapter approaches such as low rank adaptation (LoRA), we do not introduce additional parameters to the model. Our approach identifies highly relevant semantic tokens (case study in the Appendix) and outperforms end-to-end finetuning, LoRA, layer selection, and prefix tuning on five diverse semantic classification tasks. We achieve greater stability and half the training costs vs. end-to-end finetuning.