ARCHES: Adaptive Real-Time Switching of AI Models for the RAN

📅 2026-04-25

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenge that existing AI models struggle to simultaneously achieve both generality and efficiency in wireless environments: general-purpose models suffer from performance limitations, while specialized models outperform traditional algorithms only under specific conditions, and their indiscriminate deployment incurs excessive computational and energy overhead. To overcome this, the paper proposes a GPU-accelerated adaptive physical-layer framework that dynamically schedules between AI-based and conventional signal processing experts at slot boundaries based on real-time network state. Leveraging zero-overhead CUDA kernel hot-swapping, a cross-layer telemetry-driven control plane, and a reusable policy design pipeline, the system achieves a decision latency of approximately 140 microseconds. Integrated with NVIDIA Aerial and OpenAirInterface, it improves uplink channel estimation throughput by 5.32% and 7.23% under benign and harsh channel conditions, respectively, while reducing GPU power consumption by 9.6% (15.8 W) and lowering GPU utilization by 17 percentage points.

Technology Category

Application Category

📝 Abstract

Artificial Intelligence (AI) has become a powerful tool for model-free Radio Access Network (RAN) signal processing and optimization. However, designing a single model that generalizes across all radio environments is challenging. Specialized AI models outperform conventional algorithms only under specific conditions, while their higher compute and energy cost makes unconditional execution impractical at the base station. This creates a need for real-time expert switching: dynamically activating the most appropriate AI or conventional expert based on current network conditions. To address this, we propose ARCHES (Adaptive Real-time CUDA Hot-swapping of Experts in the RAN Stack), a framework hosting multiple AI-based and conventional signal processing experts within a GPU-accelerated PHY pipeline, dynamically selecting the most appropriate expert at slot-boundary granularity without dropping or corrupting in-flight data. ARCHES includes a lightweight CUDA switch kernel for zero-gap output selection, a dApp-based control plane that collects cross-layer telemetry and drives the switching policy, and a reusable process for policy design based on controlled perturbation, monotonicity filtering, and hierarchical clustering. We validate ARCHES on UL channel estimation, switching between an AI-based and a Minimum Mean Square Error (MMSE) estimator under changing propagation and interference conditions. Implemented on the X5G platform with NVIDIA Aerial and OpenAirInterface (OAI), ARCHES achieves median UL PHY throughput gains of 5.32% and 7.23% under good and poor conditions, with a control-loop latency of ~140 us and sub-microsecond decision inference. Under good conditions, defaulting to MMSE saves 15.8 W of GPU power (9.6%) and 17 percentage points of GPU utilization versus unconditional AI execution, validating the performance-per-watt tradeoff that motivates adaptive expert selection.

Problem

Research questions and friction points this paper is trying to address.

AI model switching

Radio Access Network

real-time adaptation

compute efficiency

signal processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive expert switching

real-time AI inference

GPU-accelerated RAN