Sparse Brains are Also Adaptive Brains: Cognitive-Load-Aware Dynamic Activation for LLMs

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Fixed full-parameter activation in large language models (LLMs) incurs significant computational inefficiency during inference. Method: We propose CLADA, a cognitive-load-aware dynamic activation framework that establishes the first quantitative linkage between neurolinguistic ERP components (N400/P600) and LLM sparsification mechanisms. CLADA integrates global statistical sparsity with local semantic adaptivity, enabling real-time, architecture-agnostic, and retraining-free parameter modulation. It employs prefix-statistics-driven offline error-bounded optimization and hierarchical threshold tuning guided by cognitive load signals—such as surprisal and entropy. Contribution/Results: Evaluated on six mainstream LLMs and nine benchmarks, CLADA achieves an average 20% speedup with <2% accuracy degradation, outperforming Griffin and TT. Multilevel regression analysis confirms a sparse-adaptation synergy effect (R² = 0.17), substantiating its neurocognitive interpretability.

Technology Category

Application Category

📝 Abstract

Dense large language models(LLMs) face critical efficiency bottlenecks as they rigidly activate all parameters regardless of input complexity. While existing sparsity methods(static pruning or dynamic activation) address this partially, they either lack adaptivity to contextual or model structural demands or incur prohibitive computational overhead. Inspired by human brain's dual-process mechanisms - predictive coding (N400) for backbone sparsity and structural reanalysis (P600) for complex context - we propose CLADA, a extit{ extbf{C}ognitive- extbf{L}oad- extbf{A}ware extbf{D}ynamic extbf{A}ctivation} framework that synergizes statistical sparsity with semantic adaptability. Our key insight is that LLM activations exhibit two complementary patterns: 1) extit{Global statistical sparsity} driven by sequence-level prefix information, and 2) extit{Local semantic adaptability} modulated by cognitive load metrics(e.g., surprisal and entropy). CLADA employs a hierarchical thresholding strategy: a baseline from offline error-controlled optimization ensures 40%+ sparsity, dynamically adjusted by real-time cognitive signals. Evaluations across six mainstream LLMs and nine benchmarks demonstrate that CLADA achieves extbf{~20% average speedup with<2% accuracy drop}, outperforming Griffin (5%+ degradation) and TT (negligible speedup). Crucially, we establish the first formal connection between neurolinguistic event-related potential (ERP) components and LLM efficiency mechanisms through multi-level regression analysis ($R^2=0.17$ for sparsity-adaptation synergy). Requiring no retraining or architectural changes, CLADA offers a deployable solution for resource-aware LLM inference while advancing biologically-inspired AI design. Our code is available at href{https://github.com/Oldify/CLADA}{CLADA}.

Problem

Research questions and friction points this paper is trying to address.

Efficiency bottlenecks in dense LLMs

Lack of adaptivity in sparsity methods

Synergizing statistical sparsity with semantic adaptability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic activation based on cognitive load

Hierarchical thresholding for sparsity optimization

Neurolinguistic-inspired efficiency mechanisms

🔎 Similar Papers

ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models