Online Learning for Multi-Layer Hierarchical Inference under Partial and Policy-Dependent Feedback

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in multilayer hierarchical reasoning systems where only the terminal layer provides sparse, routing-policy-dependent prediction error feedback, causing conventional importance-weighted methods to fail due to variance amplification. To overcome this, the authors propose a novel framework that integrates a variance-reduced EXP4 algorithm with Lyapunov optimization. By constructing a recursive loss structure and incorporating a stability mechanism, the approach enables unbiased loss estimation and efficient online learning of routing policies under long-term resource constraints. Empirical evaluations on large-scale multitask settings demonstrate significant improvements in learning stability and inference performance. Theoretically, the method achieves a sublinear regret bound relative to the best fixed policy in hindsight.

Technology Category

Application Category

📝 Abstract
Hierarchical inference systems route tasks across multiple computational layers, where each node may either finalize a prediction locally or offload the task to a node in the next layer for further processing. Learning optimal routing policies in such systems is challenging: inference loss is defined recursively across layers, while feedback on prediction error is revealed only at a terminal oracle layer. This induces a partial, policy-dependent feedback structure in which observability probabilities decay with depth, causing importance-weighted estimators to suffer from amplified variance. We study online routing for multi-layer hierarchical inference under long-term resource constraints and terminal-only feedback. We formalize the recursive loss structure and show that naive importance-weighted contextual bandit methods become unstable as feedback probability decays along the hierarchy. To address this, we develop a variance-reduced EXP4-based algorithm integrated with Lyapunov optimization, yielding unbiased loss estimation and stable learning under sparse and policy-dependent feedback. We provide regret guarantees relative to the best fixed routing policy in hindsight and establish near-optimality under stochastic arrivals and resource constraints. Experiments on large-scale multi-task workloads demonstrate improved stability and performance compared to standard importance-weighted approaches.
Problem

Research questions and friction points this paper is trying to address.

hierarchical inference
partial feedback
policy-dependent feedback
online learning
importance-weighted estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical inference
partial feedback
policy-dependent feedback
variance reduction
Lyapunov optimization
🔎 Similar Papers
No similar papers found.
H
Haoran Zhang
Department of Electrical and Computer Engineering, The University of Texas at Austin
S
Seohyeon Cha
Department of Electrical and Computer Engineering, The University of Texas at Austin
H
Hasan Burhan Beytur
Department of Electrical and Computer Engineering, The University of Texas at Austin
K
Kevin S Chan
DEVCOM Army Research Laboratory
Gustavo de Veciana
Gustavo de Veciana
Professor of Electrical and Computer Engineering, U.T. Austin
Communication SystemsNetworksPerformance
Haris Vikalo
Haris Vikalo
Professor, University of Texas at Austin