🤖 AI Summary
Large language models (LLMs) suffer from prohibitively high computational costs, resource consumption, and deployment expenses in both training and inference. To address this, this work proposes the first holistic model-system co-design paradigm for efficient LLMs, unifying key optimization avenues—including sparsification, quantization, attention mechanism optimization, memory-aware scheduling, compiler-level acceleration, and hardware adaptation—into a coherent framework. Through cross-layer joint optimization, we establish a comprehensive, full-stack technical taxonomy spanning training and inference, and publicly release a structured, open-source knowledge base. Beyond systematically categorizing and evaluating state-of-the-art efficient LLM techniques, our contribution provides a reusable methodology framework and practical implementation guidelines. This significantly improves model efficiency, affordability, and accessibility, establishing a standardized research infrastructure for both academia and industry.
📝 Abstract
This paper focuses on modern efficient training and inference technologies on foundation models and illustrates them from two perspectives: model and system design. Model and System Design optimize LLM training and inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible. The paper list repository is available at url{https://github.com/NoakLiu/Efficient-Foundation-Models-Survey}