Streamlined optical training of large-scale modern deep learning architectures with direct feedback alignment

📅 2024-09-01
📈 Citations: 1
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
To address the high energy consumption and computational bottlenecks of electronic hardware in deep learning training, this work presents the first experimental implementation of Direct Feedback Alignment (DFA) on a hybrid photonic-electronic platform, overcoming the longstanding limitation that optical training has been restricted to shallow models. We propose a photonic-electronic co-processing DFA paradigm: photonic processing units efficiently perform random matrix multiplications, while electronic circuits handle error feedback and parameter updates—thereby circumventing the fundamental challenge of gradient propagation in the optical domain inherent to backpropagation. The system operates at under 30 W power consumption and achieves a peak computational throughput of 1500 TeraOPS. We demonstrate end-to-end optical training of a Transformer model with over one billion parameters, attaining competitive performance across multimodal tasks—including language understanding, image classification, and diffusion-based generation—while significantly outperforming equivalently scaled all-electronic training systems in speed.

Technology Category

Application Category

📝 Abstract
Modern deep learning relies nearly exclusively on dedicated electronic hardware accelerators. Photonic approaches, with low consumption and high operation speed, are increasingly considered for inference but, to date, remain mostly limited to relatively basic tasks. Simultaneously, the problem of training deep and complex neural networks, overwhelmingly performed through backpropagation, remains a significant limitation to the size and, consequently, the performance of current architectures and a major compute and energy bottleneck. Here, we experimentally implement a versatile and scalable training algorithm, called direct feedback alignment, on a hybrid electronic-photonic platform. An optical processing unit performs large-scale random matrix multiplications, which is the central operation of this algorithm, at speeds up to 1500 TeraOPS under 30 Watts of power. We perform optical training of modern deep learning architectures, including Transformers, with more than 1B parameters, and obtain good performances on language, vision, and diffusion-based generative tasks. We study the scaling of the training time, and demonstrate a potential advantage of our hybrid opto-electronic approach for ultra-deep and wide neural networks, thus opening a promising route to sustain the exponential growth of modern artificial intelligence beyond traditional von Neumann approaches.
Problem

Research questions and friction points this paper is trying to address.

Training large deep networks efficiently with photonic methods
Overcoming backpropagation limitations in size and energy
Scaling optical training for ultra-deep neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid electronic-photonic platform training
Optical processing for random matrix multiplications
Direct feedback alignment for deep networks
🔎 Similar Papers
No similar papers found.
Z
Ziao Wang
Laboratoire Kastler Brossel, École Normale SupĂ©rieure - UniversitĂ© PSL, Sorbonne UniversitĂ©, CollĂšge de France, CNRS, UMR 8552, Paris, France.
K
Kilian Muller
LightOn, 2 rue de la Bourse, 75002 Paris, France.
M
Matthew J. Filipovich
Clarendon Laboratory, University of Oxford, Parks Road, OX1 3PU, Oxford, United Kingdom.
Julien Launay
Julien Launay
LightOn, 2 rue de la Bourse, 75002 Paris, France.
Ruben Ohana
Ruben Ohana
Senior Research Scientist, NVIDIA
Machine LearningAI for ScienceComputer VisionOptical Computing
G
Gustave Pariente
LightOn, 2 rue de la Bourse, 75002 Paris, France.
S
Safa Mokaadi
LightOn, 2 rue de la Bourse, 75002 Paris, France.
C
C. Brossollet
LightOn, 2 rue de la Bourse, 75002 Paris, France.
F
Fabien Moreau
LightOn, 2 rue de la Bourse, 75002 Paris, France.
A
Alessandro Cappelli
LightOn, 2 rue de la Bourse, 75002 Paris, France.
I
Iacopo Poli
LightOn, 2 rue de la Bourse, 75002 Paris, France.
I
I. Carron
LightOn, 2 rue de la Bourse, 75002 Paris, France.
L
L. Daudet
LightOn, 2 rue de la Bourse, 75002 Paris, France.
Florent Krzakala
Florent Krzakala
École polytechnique fĂ©dĂ©rale de Lausanne
Statistical MechanicsStatisticsMachine LearningInformation theorySpin Glasses
Sylvain Gigan
Sylvain Gigan
Professor, Sorbonne Université / Lab. Kastler-Brossel / CNRS / Ecole Normale Supérieure
Complex mediaOptical ComputingComputational Imaging