🤖 AI Summary
This work addresses the challenge of achieving efficient and robust damage-avoidance learning in complex, high-dimensional, long-horizon environments such as biomechanical digital twins. The authors propose Afferent Learning, a two-tier framework wherein an outer loop employs evolutionary optimization to shape afferent sensory structures, incorporating biologically inspired risk signals as inductive biases, while an inner loop trains reinforcement learning policies using the resulting computationally generated afferent traces (CATs). Notably, the evolutionary objective prioritizes learning efficiency over direct damage minimization and enables age-adaptive behavior. Experiments over simulated multi-decade lifespans demonstrate that the approach significantly improves learning efficiency and age-related robustness compared to handcrafted baselines, reducing high-risk actions by 23%. Ablation studies further confirm the efficacy of the framework’s core components.
📝 Abstract
We introduce Afferent Learning, a framework that produces Computational Afferent Traces (CATs) as adaptive, internal risk signals for damage-avoidance learning. Inspired by biological systems, the framework uses a two-level architecture: evolutionary optimization (outer loop) discovers afferent sensing architectures that enable effective policy learning, while reinforcement learning (inner loop) trains damage-avoidance policies using these signals. This formalizes afferent sensing as providing an inductive bias for efficient learning: architectures are selected based on their ability to enable effective learning (rather than directly minimizing damage). We provide theoretical convergence guarantees under smoothness and bounded-noise assumptions. We illustrate the general approach in the challenging context of biomechanical digital twins operating over long time horizons (multiple decades of the life-course). Here, we find that CAT-based evolved architectures achieve significantly higher efficiency and better age-robustness than hand-designed baselines, enabling policies that exhibit age-dependent behavioral adaptation (23% reduction in high-risk actions). Ablation studies validate CAT signals, evolution, and predictive discrepancy as essential. We release code and data for reproducibility.