Distilled Large Language Model-Driven Dynamic Sparse Expert Activation Mechanism

πŸ“… 2026-03-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of unreliable visual recognition in scenarios characterized by high inter-class similarity, significant scale variation, and limited computational resources. To this end, we propose the DS-MoE framework, which integrates a distilled large language model with a sparse Mixture-of-Experts (MoE) architecture. Our approach employs a text-guided dynamic routing mechanism to enable semantics-driven expert activation and incorporates a lightweight MobileSAM encoder for multi-scale defect awareness, thereby achieving precise alignment between text–visual semantics and defect patterns. Evaluated on the BBMP, aluminum foil, and PCB datasets, DS-MoE outperforms YOLOv8 and YOLOX by 13.9, 1.4, and 2.0 percentage points in mAP@0.5:0.95, respectively, while also delivering notable improvements in both precision and recall.
πŸ“ Abstract
High inter-class similarity, extreme scale variation, and limited computational budgets hinder reliable visual recognition across diverse real-world data. Existing vision-centric and cross-modal approaches often rely on rigid fusion mechanisms and heavy annotation pipelines, leading to sub-optimal generalization. We propose the Distilled Large Language Model (LLM)-Driven Sparse Mixture-of-Experts (DS-MoE) framework, which integrates text-guided dynamic routing and lightweight multi-scale comprehension. The DS-MoE framework dynamically aligns textual semantics with defect-specific visual patterns through a sparse MoE architecture, where task-relevant experts are adaptively activated based on semantic relevance, resolving inter-class ambiguity. A lightweight MobileSAM encoder enables real-time inference while preserving multi-scale defect details. Extensive experiments on PCB, aluminum foil, and mold defect datasets demonstrate that our framework achieves superior performance compared to existing pure vision models. \textbf{DS-MoE} surpasses YOLOv8/YOLOX with gains of +13.9, +1.4, and +2.0 pp mAP@ 0.5:0.95 on BBMP, aluminum, and PCB, respectively, while also improving precision and recall.
Problem

Research questions and friction points this paper is trying to address.

inter-class similarity
scale variation
computational budgets
visual recognition
real-world data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distilled LLM
Sparse Mixture-of-Experts
Text-guided dynamic routing
Lightweight multi-scale encoding
Inter-class ambiguity resolution
πŸ”Ž Similar Papers
No similar papers found.
Q
Qinghui Chen
School of Control Science and Engineering, Shandong University, Jinan, China; Laoshan Laboratory, Qingdao, China
Z
Zekai Zhang
School of Control Science and Engineering, Shandong University, Jinan, China
Z
Zaigui Zhang
Jinan Inspur Data Technology Co., Ltd, Jinan, China
Kai Zhang
Kai Zhang
University of Science and Technology of China
Artificial IntelligenceNLPKnowledge InferenceLLMs Reasoning
Dagang Li
Dagang Li
Macau University of Science and Technology
NetworkGraphTime seriesRLLLM
Wenmin Wang
Wenmin Wang
Professor, Macau University of Science and Technology
Computer VisionMultimodal Information ProcessingArtificial Intelligence and Machine Learning
J
Jinglin Zhang
School of Control Science and Engineering, Shandong University, Jinan, China
C
Cong Liu
NOVA Information Management School, Nova University of Lisbon, Lisbon, Portugal