🤖 AI Summary
Existing open-source frameworks struggle to enable practical fine-tuning of large language models (LLMs) on commercial mobile devices. This paper introduces the first end-to-end open-source fine-tuning framework tailored for smartphones, supporting both full-parameter and parameter-efficient fine-tuning (e.g., LoRA). It systematically addresses critical constraints—including memory scarcity, limited computational capacity, and energy sensitivity—through three key innovations: parameter sharding, gradient accumulation, and energy-aware computation scheduling. These techniques significantly improve resource efficiency while preserving usability. The framework has been successfully deployed on mainstream Android smartphones to perform complete fine-tuning of models including GPT-2, Gemma-3, and Qwen2.5. Experiments demonstrate a 42% reduction in peak memory consumption and a 37% decrease in energy consumption compared to baseline approaches. This work establishes a technically viable pathway and provides foundational open-source infrastructure for on-device LLM training.
📝 Abstract
Mobile phones are the most ubiquitous end devices, generating vast amounts of human-authored data and serving as the primary platform for end-side applications. As high-quality public data for large language models (LLMs) approaches exhaustion, on-device fine-tuning provides an opportunity to leverage private user data while preserving privacy. However, existing approaches are predominantly simulation-based or rely on IoT devices and PCs, leaving commodity mobile phones largely unexplored. A key gap is the absence of an open-source framework that enables practical LLM fine-tuning on mobile phones. We present MobileFineTuner, a unified open-source framework that enables end-to-end LLM fine-tuning directly on commodity mobile phones. MobileFineTuner is designed for efficiency, scalability, and usability, supporting full-parameters fine-tuning (Full-FT) and parameter-efficient fine-tuning (PEFT). To address the memory and energy limitations inherent to mobile phones, we introduce system-level optimizations including parameter sharding, gradient accumulation, and energy-aware computation scheduling. We demonstrate the practicality of MobileFineTuner by fine-tuning GPT-2, Gemma 3, and Qwen 2.5 on real mobile phones. Extensive experiments and ablation studies validate the effectiveness of the proposed optimizations and establish MobileFineTuner as a viable foundation for future research on on-device LLM training.