🤖 AI Summary
Optimizing multi-accelerator orchestration—across GPU CUDA cores, Tensor Cores, and the Deep Learning Accelerator (DLA)—for ResNet50 multi-instance inference on resource-constrained NVIDIA Jetson AGX Orin edge devices remains challenging due to heterogeneous hardware constraints and inter-accelerator contention.
Method: We conduct systematic empirical measurements across diverse accelerator combinations and batch sizes, quantifying throughput–latency trade-offs under realistic edge deployment conditions.
Contribution/Results: Our analysis reveals that CUDA core + Tensor Core collaboration achieves peak throughput, whereas integrating the DLA degrades overall performance due to memory bandwidth saturation and instruction-scheduling conflicts. We propose a hardware-aware cooperative scheduling framework grounded in measured architectural characteristics. This framework provides empirically validated insights and actionable design guidelines for heterogeneous accelerator resource allocation and runtime scheduling in edge AI platforms, bridging the gap between theoretical acceleration potential and practical system-level efficiency.
📝 Abstract
Edge devices like Nvidia Jetson have started to have multiple on-board accelerators such as GPU CUDA cores, Tensor Cores and Deep Learning Accelerators (DLA). Maximizing the DNN inferencing performance of such devices requires us to concurrently use these col-located hardware components, but this has not yet been studied. We analyze the performance of accelerators present in Jetson AGX Orin, both independently and concurrently, using multiple instances of ResNet50 model. We assess the effects of using different combinations of the components and varying batch sizes on the inference throughput and latency. Our results indicate that using CUDA Core with Tensor Cores offers a higher throughput, while using them in conjunction with DLAs reduces the benefits. This paves the way to explore more intelligent configurations to maximize the performance of edge platforms for AI workloads.