CCoE: A Compact and Efficient LLM Framework with Multi-Expert Collaboration for Resource-Limited Settings

📅 2024-07-16
📈 Citations: 1
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
To address the challenge of balancing multi-domain performance and deployment efficiency for large language models (LLMs) under resource constraints, this paper proposes a modular Mixture-of-Experts (MoE) collaboration framework. It integrates a shared backbone network with domain-specific expert subnetworks, coupled with rule-driven dynamic gating and expert-level task planning. This design preserves state-of-the-art (SOTA) performance across domains while significantly improving resource efficiency. The method achieves performance comparable to domain-specialized LLMs on five downstream domains; reduces memory footprint by 61.3% compared to conventional multi-model ensembles; and accelerates inference by 0.76× relative to parameter-efficient MoE approaches. Its core innovation lies in the first holistic integration of rule-based gating, task-aware expert scheduling, and a lightweight collaborative architecture—enabling high accuracy, low computational overhead, and strong cross-domain generalization.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have achieved exceptional performance across diverse domains through training on massive datasets. However, scaling LLMs to support multiple downstream domain applications remains a significant challenge, especially under resource constraints. Existing approaches often struggle to balance performance across multiple domains with resource efficiency, limiting their broader applicability. To address this, we introduce the CCoE architecture, a modular framework that seamlessly integrates domain-specific experts into a unified LLM. By leveraging independently trained expert subnetworks on a shared backbone partition, CCoE achieves state-of-the-art performance while significantly reducing the resource requirements for multi-expert deployments. Furthermore, rule-based gating and expert planning in CCoE enable flexible task allocation, promoting expert collaboration to handle complex reasoning tasks. CCoE not only reduces inference costs but also provides a flexible and scalable solution for integrating domain expertise across diverse applications. Experiments on five domains demonstrate that CCoE achieves comparable performance to current domain-specific LLMs. Moreover, compared to existing multi-domain model ensemble methods, CCoE reduces memory usage by 61.3%, while improving inference efficiency by 0.76x over parameter-efficient multi-expert integration approaches.
Problem

Research questions and friction points this paper is trying to address.

Balancing LLM performance and resource efficiency
Scaling LLMs for multi-domain applications under constraints
Reducing memory usage and improving inference efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework integrates domain-specific experts
Rule-based gating enables flexible task allocation
Reduces memory usage by 61.3%
🔎 Similar Papers
No similar papers found.