🤖 AI Summary
This work addresses a critical limitation in existing virtual disk placement strategies for cloud block storage, which optimize only for spatial load balancing while overlooking transient congestion caused by temporal alignment of tenant I/O load peaks. The authors propose a novel approach that recovers application-level temporal phase information from tenants’ naming metadata, leveraging large language models (LLMs) for semantic understanding. To enable low-latency inference, they employ a lightweight teacher–student knowledge distillation architecture. This method facilitates phase-complementary disk placement even in cold-start scenarios, substantially mitigating congestion. Evaluated on real-world production traces, the approach reduces overload frequency by 79.1% and decreases P95 overload duration by 73.7% compared to the strongest baseline.
📝 Abstract
Cloud Virtual Disk (CVD) placement in Cloud Block Storage (CBS) is critical for resource efficiency and performance isolation. Existing schemes prioritize spatial load balancing by dispersing disks across pods based on configuration-derived load estimates. However, overload risk in CBS is fundamentally temporal. Even when average load is balanced, pods can still suffer transient congestion when the peaks of co-located disks align in time. Achieving complementary placement, which co-locates CVDs with offset peaks, is hard at provisioning time because new disks have no history from which to infer temporal phase. We present TIDAL, a CVD placement framework that recovers phase-aware signals for cold-start placement from an underused source: tenant-provided names and identifiers in provisioning metadata. TIDAL first uses LLMs to recover application semantics from noisy metadata such as project, VM, and disk names. It then translates these semantics into phase-aware temporal signals to guide complementary placement. To satisfy control-plane constraints, TIDAL adopts an offline-to-online design with teacher-student distillation, regex-based filtering, and prefix-aware caching, enabling CPU-only inference with millisecond-level latency. Evaluations driven by production traces show that TIDAL reduces overload frequency by 79.1% and P95 overload duration by 73.7% compared with the strongest baselines.