Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs

๐Ÿ“… 2025-02-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Fixed full-parameter activation in large language models (LLMs) incurs significant computational inefficiency during inference. Method: We propose CLADA, a cognitive-load-aware dynamic activation framework that establishes the first quantitative linkage between neurolinguistic ERP components (N400/P600) and LLM sparsification mechanisms. CLADA integrates global statistical sparsity with local semantic adaptivity, enabling real-time, architecture-agnostic, and retraining-free parameter modulation. It employs prefix-statistics-driven offline error-bounded optimization and hierarchical threshold tuning guided by cognitive load signalsโ€”such as surprisal and entropy. Contribution/Results: Evaluated on six mainstream LLMs and nine benchmarks, CLADA achieves an average 20% speedup with <2% accuracy degradation, outperforming Griffin and TT. Multilevel regression analysis confirms a sparse-adaptation synergy effect (Rยฒ = 0.17), substantiating its neurocognitive interpretability.

Technology Category

Application Category

๐Ÿ“ Abstract
Dense large language models(LLMs) face critical efficiency bottlenecks as they rigidly activate all parameters regardless of input complexity. While existing sparsity methods(static pruning or dynamic activation) address this partially, they either lack adaptivity to contextual or model structural demands or incur prohibitive computational overhead. Inspired by human brain's dual-process mechanisms - predictive coding (N400) for backbone sparsity and structural reanalysis (P600) for complex context - we propose CLADA, a extit{ extbf{C}ognitive- extbf{L}oad- extbf{A}ware extbf{D}ynamic extbf{A}ctivation} framework that synergizes statistical sparsity with semantic adaptability. Our key insight is that LLM activations exhibit two complementary patterns: 1) extit{Global statistical sparsity} driven by sequence-level prefix information, and 2) extit{Local semantic adaptability} modulated by cognitive load metrics(e.g., surprisal and entropy). CLADA employs a hierarchical thresholding strategy: a baseline from offline error-controlled optimization ensures 40%+ sparsity, dynamically adjusted by real-time cognitive signals. Evaluations across six mainstream LLMs and nine benchmarks demonstrate that CLADA achieves extbf{~20% average speedup with<2% accuracy drop}, outperforming Griffin (5%+ degradation) and TT (negligible speedup). Crucially, we establish the first formal connection between neurolinguistic event-related potential (ERP) components and LLM efficiency mechanisms through multi-level regression analysis ($R^2=0.17$ for sparsity-adaptation synergy). Requiring no retraining or architectural changes, CLADA offers a deployable solution for resource-aware LLM inference while advancing biologically-inspired AI design. Our code is available at href{https://github.com/Oldify/CLADA}{CLADA}.
Problem

Research questions and friction points this paper is trying to address.

Efficiency bottlenecks in dense LLMs
Lack of adaptivity in sparsity methods
Synergizing statistical sparsity with semantic adaptability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic activation based on cognitive load
Hierarchical thresholding for sparsity optimization
Neurolinguistic-inspired efficiency mechanisms
๐Ÿ”Ž Similar Papers
No similar papers found.