🤖 AI Summary
In containerized cloud environments, RDMA improves performance but suffers from severe performance isolation failures due to resource contention in RNIC microarchitectures—vulnerable to resource-exhaustion attacks such as state or pipeline saturation. This work presents the first empirical demonstration of container-granular RDMA microarchitectural vulnerabilities in NVIDIA BlueField-3 DPUs. We propose HT-Verbs, a hardware-agnostic, software-defined isolation framework that leverages per-container verb-level real-time telemetry and adaptive resource热度 classification (hot/warm/cold) to implement a three-tier isolation mechanism. HT-Verbs integrates dynamic threshold adjustment and fine-grained bandwidth throttling without hardware modification. Experiments show it suppresses attack-induced bandwidth degradation by up to 93.9%, reduces latency inflation by 1,117×, significantly mitigates cache thrashing and link congestion, and restores system performance predictability.
📝 Abstract
In modern containerized cloud environments, the adoption of RDMA (Remote Direct Memory Access) has expanded to reduce CPU overhead and enable high-performance data exchange. Achieving this requires strong performance isolation to ensure that one container's RDMA workload does not degrade the performance of others, thereby maintaining critical security assurances. However, existing isolation techniques are difficult to apply effectively due to the complexity of microarchitectural resource management within RDMA NICs (RNICs). This paper experimentally analyzes two types of resource exhaustion attacks on NVIDIA BlueField-3: (i) state saturation attacks and (ii) pipeline saturation attacks. Our results show that state saturation attacks can cause up to a 93.9% loss in bandwidth, a 1,117x increase in latency, and a 115% rise in cache misses for victim containers, while pipeline saturation attacks lead to severe link-level congestion and significant amplification, where small verb requests result in disproportionately high resource consumption. To mitigate these threats and restore predictable security assurances, we propose HT-Verbs, a threshold-driven framework based on real-time per-container RDMA verb telemetry and adaptive resource classification that partitions RNIC resources into hot, warm, and cold tiers and throttles abusive workloads without requiring hardware modifications.