MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high engineering complexity and data scarcity challenges in mobile operator generation by presenting the first systematic evaluation of large language models (LLMs) for CPU kernel code synthesis. The authors propose MoKA, a multi-agent generation framework that integrates repository-aware reasoning with a plan-and-execute mechanism, coupled with an automated compilation and performance validation pipeline. Evaluated on the newly introduced MobileKernelBench benchmark, MoKA achieves a 93.7% compilation success rate, with 27.4% of the generated kernels outperforming native implementations on the MNN CPU backend. These results demonstrate substantial improvements in both generation efficiency and runtime performance, highlighting the potential of LLM-driven approaches for practical mobile kernel development.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated remarkable capabilities in code generation, yet their potential for generating kernels specifically for mobile de- vices remains largely unexplored. In this work, we extend the scope of automated kernel generation to the mobile domain to investigate the central question: Can LLMs write efficient kernels for mobile devices? To enable systematic investigation, we introduce MobileKernelBench, a comprehensive evaluation framework comprising a benchmark prioritizing operator diversity and cross-framework interoperability, coupled with an automated pipeline that bridges the host-device gap for on-device verification. Leveraging this framework, we conduct extensive evaluation on the CPU backend of Mobile Neural Network (MNN), revealing that current LLMs struggle with the engineering complexity and data scarcity inher-ent to mobile frameworks; standard models and even fine-tuned variants exhibit high compilation failure rates (over 54%) and negligible performance gains due to hallucinations and a lack of domain-specific grounding. To overcome these limitations, we propose the Mobile K ernel A gent (MoKA), a multi-agent system equipped with repository-aware reasoning and a plan-and-execute paradigm.Validated on MobileKernelBench, MoKA achieves state-of-the-art performance, boosting compilation success to 93.7% and enabling 27.4% of generated kernelsto deliver measurable speedups over native libraries.
Problem

Research questions and friction points this paper is trying to address.

LLM
mobile kernel
code generation
performance optimization
on-device execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

MobileKernelBench
LLM code generation
mobile kernel optimization
multi-agent system
on-device evaluation
🔎 Similar Papers
No similar papers found.
X
Xingze Zou
Zhejiang University
J
Jing Wang
Zhejiang University
Y
Yuhua Zheng
Zhejiang University
Xueyi Chen
Xueyi Chen
Master of Science,The Chinsese University of Hong Kong
MLLMAgent-driven Reasoning
H
Haolei Bai
Westlake University
L
Lingcheng Kong
HKUST
S
Syed A. R. Abu-Bakar
Universiti Teknologi Malaysia
Zhaode Wang
Zhaode Wang
Alibaba
C
Chengfei Lv
Alibaba
Haoji Hu
Haoji Hu
Zhejiang Univeristy, China
Machine LearningComputer VisionDeep Learning
Huan Wang
Huan Wang
Westlake University
Efficient AIComputer VisionMachine Learning