Hierarchical LoRA MoE for Efficient CTR Model Scaling

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges in CTR prediction—excessive computational overhead from vertical model scaling (i.e., deepening networks) and limited hierarchical modeling capability in horizontal scaling (i.e., MoE)—this paper proposes a Hierarchical LoRA-MoE framework. Our method innovatively integrates Low-Rank Adaptation (LoRA) with Mixture of Experts (MoE), introducing rank-1 lightweight experts and a multi-layer parallel routing mechanism guided by scores derived from preceding-layer outputs. A three-stage training strategy is further designed to ensure convergence stability and expert diversity. Evaluated on four public benchmarks, the proposed model achieves an average AUC improvement of 0.20% and reduces FLOPs by 18.5% over non-MoE baselines. To our knowledge, this is the first work to enable parameter-efficient, computation-aware, and hierarchy-aware large-scale CTR model scaling.

Technology Category

Application Category

📝 Abstract
Deep models have driven significant advances in click-through rate (CTR) prediction. While vertical scaling via layer stacking improves model expressiveness, the layer-by-layer sequential computation poses challenges to efficient scaling. Conversely, horizontal scaling through Mixture of Experts (MoE) achieves efficient scaling by activating a small subset of experts in parallel, but flat MoE layers may struggle to capture the hierarchical structure inherent in recommendation tasks. To push the Return-On-Investment (ROI) boundary, we explore the complementary strengths of both directions and propose HiLoMoE, a hierarchical LoRA MoE framework that enables holistic scaling in a parameter-efficient manner. Specifically, HiLoMoE employs lightweight rank-1 experts for parameter-efficient horizontal scaling, and stacks multiple MoE layers with hierarchical routing to enable combinatorially diverse expert compositions. Unlike conventional stacking, HiLoMoE routes based on prior layer scores rather than outputs, allowing all layers to execute in parallel. A principled three-stage training framework ensures stable optimization and expert diversity. Experiments on four public datasets show that HiLoMoE achieving better performance-efficiency tradeoff, achieving an average AUC improvement of 0.20% in AUC and 18.5% reduction in FLOPs compared to the non-MoE baseline.
Problem

Research questions and friction points this paper is trying to address.

Efficiently scaling CTR models while maintaining hierarchical feature learning
Overcoming sequential computation bottlenecks in vertical model scaling
Enhancing parameter efficiency through hierarchical LoRA MoE architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical LoRA MoE framework for efficient scaling
Lightweight rank-1 experts enable parameter-efficient horizontal scaling
Parallel execution via hierarchical routing based on prior scores
🔎 Similar Papers
No similar papers found.