Analog Foundation Models

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying large language models (LLMs) on analog in-memory computing (AIMC) hardware suffers from severe performance degradation due to analog noise and ultra-low-precision weights (e.g., 4-bit). Method: This paper proposes a general, scalable joint optimization framework integrating noise-robust fine-tuning, dynamic quantization-aware training, and hardware-aware modeling. Contribution/Results: It is the first method enabling robust inference of trillion-parameter pre-trained LLMs on noisy AIMC hardware, substantially bridging the gap between LLM capacity and efficient analog acceleration. Unexpectedly, it also enables seamless transfer to low-precision digital hardware and exhibits strong test-time compute scaling. On Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct, AIMC deployment achieves performance comparable to digital baselines using 4-bit weights and 8-bit activations—while significantly reducing power consumption.

Technology Category

Application Category

📝 Abstract
Analog in-memory computing (AIMC) is a promising compute paradigm to improve speed and power efficiency of neural network inference beyond the limits of conventional von Neumann-based architectures. However, AIMC introduces fundamental challenges such as noisy computations and strict constraints on input and output quantization. Because of these constraints and imprecisions, off-the-shelf LLMs are not able to achieve 4-bit-level performance when deployed on AIMC-based hardware. While researchers previously investigated recovering this accuracy gap on small, mostly vision-based models, a generic method applicable to LLMs pre-trained on trillions of tokens does not yet exist. In this work, we introduce a general and scalable method to robustly adapt LLMs for execution on noisy, low-precision analog hardware. Our approach enables state-of-the-art models $unicode{x2013}$ including Phi-3-mini-4k-instruct and Llama-3.2-1B-Instruct $unicode{x2013}$ to retain performance comparable to 4-bit weight, 8-bit activation baselines, despite the presence of analog noise and quantization constraints. Additionally, we show that as a byproduct of our training methodology, analog foundation models can be quantized for inference on low-precision digital hardware. Finally, we show that our models also benefit from test-time compute scaling, showing better scaling behavior than models trained with 4-bit weight and 8-bit static input quantization. Our work bridges the gap between high-capacity LLMs and efficient analog hardware, offering a path toward energy-efficient foundation models. Code is available at https://github.com/IBM/analog-foundation-models .
Problem

Research questions and friction points this paper is trying to address.

Adapting LLMs for noisy, low-precision analog hardware
Closing accuracy gap in 4-bit performance on AIMC
Enabling energy-efficient foundation models via analog computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

General method adapts LLMs for noisy analog hardware
Enables 4-bit weight, 8-bit activation performance retention
Supports quantization for low-precision digital hardware
🔎 Similar Papers
No similar papers found.
J
Julian Buchel
IBM Research – Zurich, ETH Zürich
I
Iason Chalas
IBM Research – Zurich
G
Giovanni Acampa
IBM Research – Zurich
A
An Chen
IBM Research – Almaden
O
Omobayode Fagbohungbe
IBM Thomas J. Watson Research Center
S
Sidney Tsai
IBM Research – Almaden
K
K. E. Maghraoui
IBM Thomas J. Watson Research Center
Manuel Le Gallo
Manuel Le Gallo
IBM Research-Zurich
Abbas Rahimi
Abbas Rahimi
Research Staff Member, IBM Research-Zurich
Machine ReasoningNeurosymbolic AIAI HardwareHW/SW CodesignEmbedded Systems
Abu Sebastian
Abu Sebastian
Distinguished Scientist, IBM Research - Zurich
In-memory computingBrain-inspired computingArtificial IntelligenceExploratory memoryNanotechnology