🤖 AI Summary
This study systematically investigates the impact of memristor non-idealities on the inference performance and reliability of large language models (LLMs) in compute-in-memory architectures. To address typical hardware-induced imperfections, the work proposes three training-free robustness enhancement strategies: chain-of-thought prompting, in-context learning, and shallow redundancy, and evaluates their effectiveness across diverse reasoning tasks. The experiments reveal, for the first time, significant variations in task sensitivity to non-idealities: shallow redundancy substantially improves model robustness, in-context learning reduces output length while preserving performance, and chain-of-thought prompting is effective only under low-noise conditions. Based on these findings, the study formulates practical design guidelines for deploying reliable LLMs on non-ideal memristor-based hardware.
📝 Abstract
Memristor-based analog compute-in-memory (CIM) architectures provide a promising substrate for the efficient deployment of Large Language Models (LLMs), owing to superior energy efficiency and computational density. However, these architectures suffer from precision issues caused by intrinsic non-idealities of memristors. In this paper, we first conduct a comprehensive investigation into the impact of such typical non-idealities on LLM reasoning. Empirical results indicate that reasoning capability decreases significantly but varies for distinct benchmarks. Subsequently, we systematically appraise three training-free strategies, including thinking mode, in-context learning, and module redundancy. We thus summarize valuable guidelines, i.e., shallow layer redundancy is particularly effective for improving robustness, thinking mode performs better under low noise levels but degrades at higher noise, and in-context learning reduces output length with a slight performance trade-off. Our findings offer new insights into LLM reasoning under non-ideality and practical strategies to improve robustness.