Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

This study addresses the severe latency challenges of deploying Vision-Language-Action (VLA) models on edge devices, which hinder real-time performance. Focusing on the MolmoAct-7B model, we conduct a systematic performance characterization on NVIDIA Jetson Orin and Thor platforms and identify, for the first time, that the action generation phase constitutes a memory-bound bottleneck, accounting for up to 75% of end-to-end latency. Leveraging analytical modeling and simulation-based projection, we prospectively evaluate the potential of high-bandwidth memory (HBM) and processing-in-memory (PIM) architectures to support future VLA models with tens of billions of parameters, thereby quantifying the critical hardware capabilities required for next-generation edge AI systems.

Technology Category

Application Category

📝 Abstract

Vision-Language-Action (VLA) models are an emerging class of workloads critical for robotics and embodied AI at the edge. As these models scale, they demonstrate significant capability gains, yet they must be deployed locally to meet the strict latency requirements of real-time applications. This paper characterizes VLA performance on two generations of edge hardware, viz. the Nvidia Jetson Orin and Thor platforms. Using MolmoAct-7B, a state-of-the-art VLA model, we identify a primary execution bottleneck: up to 75% of end-to-end latency is consumed by the memory-bound action-generation phase. Through analytical modeling and simulations, we project the hardware requirements for scaling to 100B parameter models. We also explore the impact of high-bandwidth memory technologies and processing-in-memory (PIM) as promising future pathways in edge systems for embodied AI.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models

edge AI

action generation bottleneck

latency

memory-bound

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action models

action generation bottleneck

edge AI architectures