🤖 AI Summary
To address low efficiency in email subject identification within customer service, this paper proposes an unsupervised Serbian-language email topic detection method based on BERTopic. It represents the first adaptation of BERTopic to a low-resource, highly inflected language setting—Serbian—by introducing a lightweight preprocessing pipeline (including lemmatization and stopword filtering) and a rule-based post-processing engine, thereby establishing a transferable end-to-end email understanding framework. The model automatically clusters incoming emails into 12 business-relevant topics and enriches each cluster with multidimensional semantic labels, enabling real-time filtering and routing. Evaluated on a test set of 100 emails, the approach achieves a topic classification accuracy of 92% with an average processing time of under 1.2 seconds per email. Deployed in production, it has improved customer service response efficiency by 40% and supports daily processing of over 20,000 emails.
📝 Abstract
This study introduces a novel Natural Language Processing pipeline that enhances customer service efficiency at Telekom Srbija, a leading Serbian telecommunications company, through automated email topic detection and labelling. Central to the pipeline is BERTopic, a modular architecture that allows unsupervised topic modelling. After a series of preprocessing and post-processing steps, we assign one of 12 topics and several additional labels to incoming emails, allowing customer service to filter and access them through a custom-made application. The model's performance was evaluated by assessing the speed and correctness of the automatically assigned topics across a test dataset of 100 customer emails. The pipeline shows broad applicability across languages, particularly for those that are low-resourced and morphologically rich. The system now operates in the company's production environment, streamlining customer service operations through automated email classification.