🤖 AI Summary
To address the challenge of cross-lingual misalignment between queries and products in multilingual e-commerce search, this paper proposes a multi-stage adaptive framework. First, input quality is improved via language identification, noise filtering, and data refinement. Second, explicit language tagging and category-aware preprocessing are introduced to enhance data consistency and fine-grained category coverage. Third, a joint optimization objective is adopted for query-category classification and query-product matching, with hyperparameters tuned on a custom validation set. Built upon systematic exploration across diverse model architectures and iterative evaluation, the framework significantly improves model generalization and robustness. Evaluated in an international competition, it achieved fifth place overall (score: 0.8819), with consistently strong performance across all metrics—demonstrating its effectiveness and scalability in multilingual, multi-domain e-commerce settings.
📝 Abstract
This study presents the multilingual e-commerce search system developed by the DILAB team, which achieved 5th place on the final leaderboard with a competitive overall score of 0.8819, demonstrating stable and high-performing results across evaluation metrics. To address challenges in multilingual query-item understanding, we designed a multi-stage pipeline integrating data refinement, lightweight preprocessing, and adaptive modeling. The data refinement stage enhanced dataset consistency and category coverage, while language tagging and noise filtering improved input quality. In the modeling phase, multiple architectures and fine-tuning strategies were explored, and hyperparameters optimized using curated validation sets to balance performance across query-category (QC) and query-item (QI) tasks. The proposed framework exhibited robustness and adaptability across languages and domains, highlighting the effectiveness of systematic data curation and iterative evaluation for multilingual search systems. The source code is available at https://github.com/2noweyh/DILAB-Alibaba-Ecommerce-Search.