EdgeFM: Efficient Edge Inference for Vision-Language Models

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
This work addresses the challenges of deploying vision-language models on edge devices—namely low-latency requirements, resource constraints, hardware dependency, and poor cross-platform adaptability—by proposing EdgeFM, a lightweight agent-driven inference framework. EdgeFM reduces per-request latency by streamlining non-essential functionalities and replaces proprietary toolchains with a modular, reusable skill library of agent-optimized底层 operators. It achieves the first end-to-end deployment of a vision-language agent on the domestic Horizon Journey platform and supports mainstream edge hardware including x86, NVIDIA Orin, and Horizon Journey. Experiments demonstrate that EdgeFM delivers a 1.49× speedup over TensorRT-Edge-LLM on the Orin platform, significantly outperforming conventional specialized toolchains while offering an efficient, stable, and open-source solution for edge inference.
📝 Abstract
Vision-language models (VLMs) have demonstrated strong applicability in edge industrial applications, yet their deployment remains severely constrained by requirements for deterministic low latency and stable execution under resource limitations. Existing frameworks either rely on bloated general-purpose designs or force developers into opaque, hardware-specific closed-source ecosystems, leading to hardware lock-in limitation and poor cross-platform adaptability. Observing that modern AI agents can efficiently search and tune configurations to generate highly optimized low-level kernels for standard LLM operators, we propose EdgeFM, a lightweight, agent-driven VLM/LLM inference framework tailored for cross-platform industrial edge deployment. EdgeFM removes non-essential features to reduce single-request latency, and encapsulates agent-tuned kernel optimizations as a modular library of reusable skills. By allowing direct invocation of these skills rather than waiting for closed-source implementations, it effectively closes the performance gap long dominated by proprietary toolchains. The framework natively supports mainstream platforms including x86 and NVIDIA Orin SoCs, and represents the first end-to-end VLA deployment on the domestic Horizon Journey platform, enhancing cross-platform portability. In most cases, it yields clearly better inference performance than conventional vendor-specific toolchains, achieving up to 1.49 times speedup over TensorRT-Edge-LLM on the NVIDIA Orin platform. Experimental results show that EdgeFM delivers favorable end-to-end inference performance, providing an open-source, production-grade solution for diverse edge industrial scenarios.
Problem

Research questions and friction points this paper is trying to address.

vision-language models
edge inference
low latency
resource constraints
cross-platform adaptability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge Inference
Vision-Language Models
Agent-Driven Optimization
Cross-Platform Deployment
Kernel Tuning
🔎 Similar Papers
No similar papers found.
M
Mengling Deng
Go Further. AI
Y
Yuanpeng Chen
Go Further. AI
S
Sheng Yang
School of Data Science, Fudan University
Wei Tao
Wei Tao
Huazhong University of Science and Technology
QuantizationLLMTime-Series
W
Wenhai Zhang
Go Further. AI
H
Hui Song
Go Further. AI
L
Linyuanhao Qin
School of Data Science, Fudan University
K
Kai Zhao
Go Further. AI
X
Xiaojun Ye
Go Further. AI
S
Shanhui Mo
Independent Researcher
J
Jingli Fan
Go Further. AI
Shuang Zhang
Shuang Zhang
Chair Professor, University of Hong Kong;
metamaterialstopological photonicsmetasurfacesplasmonicsnonlinear optics
B
Bei Liu
Go Further. AI
T
Tiankun Zhao
Go Further. AI
X
Xiangjing An
Go Further. AI