Inference Offloading for Cost-Sensitive Binary Classification at the Edge

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

In edge intelligence scenarios where false-negative costs significantly outweigh false-positive costs in binary classification, this paper proposes a hierarchical inference framework that jointly optimizes local lightweight models and remote large models to dynamically balance classification accuracy and offloading communication overhead. Our core contribution is H2T2—an online double-threshold strategy that requires no retraining, is model-agnostic, and relies solely on sample-wise confidence scores. H2T2 theoretically guarantees sublinear regret and adapts robustly to model miscalibration and data distribution shifts. It performs real-time optimization via limited feedback, eliminating the need for offline hyperparameter tuning. Evaluated on multiple real-world datasets, H2T2 consistently outperforms single-threshold baselines and, in several settings, approaches or even surpasses the performance of offline optimal solutions—demonstrating strong robustness and generalization across diverse edge deployment conditions.

Technology Category

Application Category

📝 Abstract

We focus on a binary classification problem in an edge intelligence system where false negatives are more costly than false positives. The system has a compact, locally deployed model, which is supplemented by a larger, remote model, which is accessible via the network by incurring an offloading cost. For each sample, our system first uses the locally deployed model for inference. Based on the output of the local model, the sample may be offloaded to the remote model. This work aims to understand the fundamental trade-off between classification accuracy and these offloading costs within such a hierarchical inference (HI) system. To optimize this system, we propose an online learning framework that continuously adapts a pair of thresholds on the local model's confidence scores. These thresholds determine the prediction of the local model and whether a sample is classified locally or offloaded to the remote model. We present a closed-form solution for the setting where the local model is calibrated. For the more general case of uncalibrated models, we introduce H2T2, an online two-threshold hierarchical inference policy, and prove it achieves sublinear regret. H2T2 is model-agnostic, requires no training, and learns in the inference phase using limited feedback. Simulations on real-world datasets show that H2T2 consistently outperforms naive and single-threshold HI policies, sometimes even surpassing offline optima. The policy also demonstrates robustness to distribution shifts and adapts effectively to mismatched classifiers.

Problem

Research questions and friction points this paper is trying to address.

Optimizing hierarchical inference for cost-sensitive binary classification at edge

Balancing classification accuracy with offloading costs in edge systems

Developing online learning framework for adaptive threshold-based offloading decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-threshold hierarchical inference policy

Online learning for confidence score adaptation

Model-agnostic offloading with sublinear regret

🔎 Similar Papers

OCCAM: Towards Cost-Efficient and Accuracy-Aware Image Classification Inference