Vocabulary Customization for Efficient Domain-Specific LLM Deployment

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address subword segmentation redundancy, excessive sequence length, and inference latency in large language models (LLMs) when processing out-of-domain text—caused by vocabulary mismatch—this paper proposes a **length-preserving vocabulary expansion method**. Leveraging domain-specific term frequency analysis, it seamlessly injects high-frequency domain tokens into the pretrained tokenizer and introduces an optimization algorithm guaranteeing that the tokenized sequence length never exceeds that produced by the original vocabulary. The method requires no model retraining, preserving tokenizer efficiency and backward compatibility. Evaluated on real-world e-commerce data, it reduces input sequence length by up to 20%, significantly lowering inference latency while maintaining zero performance degradation on downstream tasks. Its core contribution is the first realization of **strictly length-constrained, domain-adaptive vocabulary expansion**, uniquely balancing computational efficiency, system compatibility, and practical deployability.

Technology Category

Application Category

📝 Abstract

When using an LLM to process text outside the training domain(s), an often overlooked factor is vocabulary mismatch, where the general-domain tokenizer fails to capture frequent domain-specific terms, leading to higher token fertility and thus a decrease in processing speed due to suboptimal sub-word splits. We address this limitation by augmenting the pretrained vocabulary with a set of domain-specific tokens. To this end, we design an algorithm that extends an existing tokenizer while guaranteeing it never decreases tokenization efficiency: every input sequence is segmented into at most the same number of tokens as before. Evaluated on real-world e-Commerce use-cases, the augmented tokenizer significantly shortens input sequences by up to 20% and reduces inference latency on downstream tasks while preserving predictive quality. We further analyze secondary effects, such as the impact on forward pass speed and the rate at which the model adopts the newly introduced tokens, to illustrate the broader benefits of vocabulary adaptation.

Problem

Research questions and friction points this paper is trying to address.

Addresses vocabulary mismatch in LLMs for domain-specific text processing

Reduces token fertility and improves inference speed through vocabulary augmentation

Maintains predictive quality while shortening input sequences by up to 20%

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting pretrained vocabulary with domain-specific tokens

Designing algorithm to extend tokenizer without efficiency loss

Reducing input sequences by up to 20% in e-Commerce

🔎 Similar Papers

No similar papers found.