🤖 AI Summary
Large language models (LLMs) suffer from high fine-tuning costs, significant inference latency, poor edge deployability, and reliability issues, while small language models (SLMs) exhibit weak generalization—creating a complementary bottleneck in practical deployment. Method: This paper proposes a multi-objective SLM-LLM co-design framework. Through a systematic literature review and taxonomic analysis, it establishes the first unified classification system for SLM-LLM collaboration, spanning four dimensions: performance enhancement, cost efficiency, cloud-edge privacy preservation, and model trustworthiness. It further synthesizes cross-scenario collaboration paradigms, distills key design principles, and identifies core trade-offs among efficiency, security, and scalability. Contribution/Results: The work delivers a rigorous analytical framework for theoretical modeling and real-world deployment of SLM-LLM synergy, offering both foundational taxonomy and concrete directions for future research and engineering.
📝 Abstract
Large language models (LLMs) have advanced many domains and applications but face high fine-tuning costs, inference latency, limited edge deployability, and reliability concerns. Small language models (SLMs), compact, efficient, and adaptable, offer complementary remedies. Recent work explores collaborative frameworks that fuse SLMs' specialization and efficiency with LLMs' generalization and reasoning to meet diverse objectives across tasks and deployment scenarios. Motivated by these developments, this paper presents a systematic survey of SLM-LLM collaboration organized by collaboration objectives. We propose a taxonomy with four goals: performance enhancement, cost-effectiveness, cloud-edge privacy, and trustworthiness. Within this framework, we review representative methods, summarize design paradigms, and outline open challenges and future directions toward efficient, secure, and scalable SLM-LLM collaboration.