Productively Deploying Emerging Models on Emerging Platforms: A Top-Down Approach for Testing and Debugging

📅 2024-04-14

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address low testing and debugging efficiency and immature toolchains when deploying rapidly evolving large language models (LLMs) on emerging platforms (e.g., browsers, mobile devices), this paper proposes TapML—a top-down, test-driven framework. Methodologically, TapML introduces (1) the first operator-level test pruning technique that automatically generates high-coverage, realistic test inputs; (2) a progressive cross-platform migration strategy that significantly narrows the scope for compound error localization; and (3) native backend support for Metal and WebGPU, with deep integration into MLC-LLM. Evaluated over two years, TapML has enabled efficient deployment of 105 emerging models—spanning 27 distinct architectures—across five platform categories, reducing average deployment time by 42%. It has since become the default development paradigm for MLC-LLM.

Technology Category

Application Category

📝 Abstract

While existing machine learning (ML) frameworks focus on established platforms, like running CUDA on server-grade GPUs, there have been growing demands to enable emerging AI applications in a broader set of scenarios, such as running Large Language Models (LLMs) within browsers and mobile phones. However, deploying emerging models on new platforms (such as Metal and WebGPU) presents significant software engineering challenges due to rapid model evolution and limited tooling and practices for these platforms. Previous practice for ML model deployment often follows a bottom-up fashion, where engineers first implement individual required operators and then put them together. However, this traditional development approach fails to meet the productivity requirements when deploying emerging ML applications, with the testing and debugging part as a bottleneck. To this end, we introduce extsc{TapML}, a top-down approach designed to streamline model deployment on diverse platforms. While the traditional bottom-up approach requires crafting manual tests, extsc{TapML} automatically creates high-quality, realistic test data through operator-wise test carving. Furthermore, extsc{TapML} uses a migration-based strategy to gradually offload model implementation from the mature source platform to the target platform, minimizing the debugging scope of compound errors. extsc{TapML} has been used as the default development method in the MLC-LLM project to deploy emerging ML models. Within 2 years, extsc{TapML} has accelerated the deployment of 105 emerging models in 27 model architectures across 5 emerging platforms. We show that extsc{TapML} effectively boosts developer productivity while ensuring the quality of deployed models. Furthermore, we summarize comprehensive case studies from our real-world development, offering best practices for developing emerging ML systems.

Problem

Research questions and friction points this paper is trying to address.

Deploying emerging AI models on new platforms like Metal and WebGPU

Overcoming testing and debugging bottlenecks in ML model deployment

Streamlining model deployment with a top-down approach for diverse platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Top-down approach for model deployment

Automated test data creation via operator-wise carving

Migration-based strategy for gradual platform offloading

🔎 Similar Papers

Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices