Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the severe latency challenges of deploying Vision-Language-Action (VLA) models on edge devices, which hinder real-time performance. Focusing on the MolmoAct-7B model, we conduct a systematic performance characterization on NVIDIA Jetson Orin and Thor platforms and identify, for the first time, that the action generation phase constitutes a memory-bound bottleneck, accounting for up to 75% of end-to-end latency. Leveraging analytical modeling and simulation-based projection, we prospectively evaluate the potential of high-bandwidth memory (HBM) and processing-in-memory (PIM) architectures to support future VLA models with tens of billions of parameters, thereby quantifying the critical hardware capabilities required for next-generation edge AI systems.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models are an emerging class of workloads critical for robotics and embodied AI at the edge. As these models scale, they demonstrate significant capability gains, yet they must be deployed locally to meet the strict latency requirements of real-time applications. This paper characterizes VLA performance on two generations of edge hardware, viz. the Nvidia Jetson Orin and Thor platforms. Using MolmoAct-7B, a state-of-the-art VLA model, we identify a primary execution bottleneck: up to 75% of end-to-end latency is consumed by the memory-bound action-generation phase. Through analytical modeling and simulations, we project the hardware requirements for scaling to 100B parameter models. We also explore the impact of high-bandwidth memory technologies and processing-in-memory (PIM) as promising future pathways in edge systems for embodied AI.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
edge AI
action generation bottleneck
latency
memory-bound
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action models
action generation bottleneck
edge AI architectures
processing-in-memory
memory-bound latency
🔎 Similar Papers
No similar papers found.
M
Manoj Vishwanathan
Google, Mountain View, CA, USA; Purdue University, West Lafayette, IN, USA
Suvinay Subramanian
Suvinay Subramanian
Google
Computer Systems
A
Anand Raghunathan
Purdue University, West Lafayette, IN, USA