Contextual Bandits for Resource-Constrained Devices using Probabilistic Learning

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Deploying contextual bandits (CB) on memory-, compute-, and energy-constrained devices is hindered by high update overhead. This work proposes a hyperdimensional computing (HDC)-based CB approach leveraging a probabilistic update rule that preserves magnitude information while substantially reducing computational cost. By employing low-precision representations, stochastic subset updates, time-decaying update probabilities, and bounded vector-valued ranges, the method avoids numerical overflow and eliminates the need for periodic binarization. Off-policy evaluations demonstrate that the proposed algorithm achieves performance comparable to the original HDC-CB on standard synthetic CB benchmarks using only 3 bits per vector component, and outperforms binarized HDC-CB at equivalent precision levels.

📝 Abstract

Contextual bandits (CB) are online sequential decision-making problems under partial feedback that underpin many adaptive services. There is a growing demand to deploy CB agents directly on-device, under strict constraints on memory, compute, and energy. However, standard linear CB algorithms are often impractical for resource-constrained devices with their unfavorable scaling in computational and memory costs. Recently, HD-CB, a CB approach based on hyperdimensional computing principles, has been proposed to model and solve CB problems by moving into high-dimensional spaces. HD-CB offers faster convergence, favorable scalability, and improves memory efficiency compared to linear CB algorithms. However, its learning rule is accumulation-based: the values of action vectors grow over time, requiring high precision. While periodic binarization can prevent overflow in low-precision components, it may discard important information about magnitudes and degrade decision quality. This paper introduces probabilistic HD-CB, a low-precision variant that replaces deterministic accumulation with a probabilistic update rule. At each step, only a random subset of vector components is updated, with a time-decaying update probability, and component values are constrained to a predefined range [-k,+k]. This approach enables low-precision components, prevents overflow without periodic binarization, and reduces the expected update cost in proportion to the fraction of updated components. Off-policy evaluation on standardized synthetic CB benchmarks using the Open Bandit Pipeline shows that probabilistic HD-CB consistently outperforms binarized HD-CB at equal precision, while approaching the performance of HD-CB with as few as 3 bits per component.

Problem

Research questions and friction points this paper is trying to address.

Contextual Bandits

Resource-Constrained Devices

Hyperdimensional Computing

Low-Precision Learning

Online Decision-Making

Innovation

Methods, ideas, or system contributions that make the work stand out.

probabilistic HD-CB

hyperdimensional computing

low-precision learning