Taxon: Hierarchical Tax Code Prediction with Semantically Aligned LLM Expert Guidance

πŸ“… 2026-01-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes Taxon, a framework designed to address the compliance risks arising from inaccurate matching between e-commerce products and multi-level national tax classification systems. Taxon integrates a multimodal feature-gating mixture-of-experts (MoE) architecture, a semantic consistency verification mechanism guided by distillation from large language models, and a multi-source training pipeline that unifies tax code databases, invoice logs, and merchant registration data. To enhance structural consistency, the framework further incorporates full hierarchical path reconstruction. Evaluated on both a newly curated TaxCode dataset and public benchmarks, Taxon achieves state-of-the-art performance and has been deployed in Alibaba’s tax system, handling over 500,000 queries daily while significantly improving accuracy, interpretability, and robustness.

Technology Category

Application Category

πŸ“ Abstract
Tax code prediction is a crucial yet underexplored task in automating invoicing and compliance management for large-scale e-commerce platforms. Each product must be accurately mapped to a node within a multi-level taxonomic hierarchy defined by national standards, where errors lead to financial inconsistencies and regulatory risks. This paper presents Taxon, a semantically aligned and expert-guided framework for hierarchical tax code prediction. Taxon integrates (i) a feature-gating mixture-of-experts architecture that adaptively routes multi-modal features across taxonomy levels, and (ii) a semantic consistency model distilled from large language models acting as domain experts to verify alignment between product titles and official tax definitions. To address noisy supervision in real business records, we design a multi-source training pipeline that combines curated tax databases, invoice validation logs, and merchant registration data to provide both structural and semantic supervision. Extensive experiments on the proprietary TaxCode dataset and public benchmarks demonstrate that Taxon achieves state-of-the-art performance, outperforming strong baselines. Further, an additional full hierarchical paths reconstruction procedure significantly improves structural consistency, yielding the highest overall F1 scores. Taxon has been deployed in production within Alibaba's tax service system, handling an average of over 500,000 tax code queries per day and reaching peak volumes above five million requests during business event with improved accuracy, interpretability, and robustness.
Problem

Research questions and friction points this paper is trying to address.

tax code prediction
hierarchical taxonomy
e-commerce compliance
multi-level classification
regulatory risk
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical tax code prediction
mixture-of-experts
semantic alignment
LLM distillation
multi-source supervision
πŸ”Ž Similar Papers
No similar papers found.
J
Jihang Li
Data Science and Analytics Thrust, Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Q
Qing Liu
Alibaba Group, Hangzhou, China
Zulong Chen
Zulong Chen
Director, Alibaba Group
Machine LearningLarge Language ModelSearch&RecommendationNLP
J
Jing Wang
Alibaba Group, Hangzhou, China
Wei Wang
Wei Wang
Tongyi Lab, Alibaba Group
Generative Models
C
Chuanfei Xu
Guangdong Laboratory of Artificial Intelligence and Digital Economy (Shenzhen), Shenzhen, China
Zeyi Wen
Zeyi Wen
Assistant Professor at HKUST(Guangzhou)
Efficient LLMsMLSysHPOHPC