Low-rank surrogate modeling and stochastic zero-order optimization for training of neural networks with black-box layers

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of embedding non-differentiable physical devices—such as photonic integrated circuits and neuromorphic hardware—into end-to-end trainable hybrid neural networks. We propose a zeroth-order optimization framework that combines a dynamically updated low-rank surrogate model with an implicit projected splitting integration algorithm. Leveraging only a small number of hardware queries, it efficiently approximates gradients of the black-box physical layer and updates parameters without full matrix reconstruction. The method requires no hardware modification and makes no differentiability assumptions, enabling plug-and-play integration of diverse non-differentiable devices—including spatial light modulators and microring resonators. Evaluated on image, audio, and language modeling tasks, it achieves performance comparable to fully digital baselines while significantly reducing training overhead. Empirical results demonstrate its cross-modal generalizability, cross-platform compatibility, and robustness across heterogeneous physical substrates.

Technology Category

Application Category

📝 Abstract
The growing demand for energy-efficient, high-performance AI systems has led to increased attention on alternative computing platforms (e.g., photonic, neuromorphic) due to their potential to accelerate learning and inference. However, integrating such physical components into deep learning pipelines remains challenging, as physical devices often offer limited expressiveness, and their non-differentiable nature renders on-device backpropagation difficult or infeasible. This motivates the development of hybrid architectures that combine digital neural networks with reconfigurable physical layers, which effectively behave as black boxes. In this work, we present a framework for the end-to-end training of such hybrid networks. This framework integrates stochastic zeroth-order optimization for updating the physical layer's internal parameters with a dynamic low-rank surrogate model that enables gradient propagation through the physical layer. A key component of our approach is the implicit projector-splitting integrator algorithm, which updates the lightweight surrogate model after each forward pass with minimal hardware queries, thereby avoiding costly full matrix reconstruction. We demonstrate our method across diverse deep learning tasks, including: computer vision, audio classification, and language modeling. Notably, across all modalities, the proposed approach achieves near-digital baseline accuracy and consistently enables effective end-to-end training of hybrid models incorporating various non-differentiable physical components (spatial light modulators, microring resonators, and Mach-Zehnder interferometers). This work bridges hardware-aware deep learning and gradient-free optimization, thereby offering a practical pathway for integrating non-differentiable physical components into scalable, end-to-end trainable AI systems.
Problem

Research questions and friction points this paper is trying to address.

Training neural networks with non-differentiable black-box layers
Enabling gradient propagation through physical components
Integrating hardware-aware deep learning with optimization methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic zeroth-order optimization for physical layers
Dynamic low-rank surrogate model for gradient propagation
Implicit projector-splitting integrator algorithm updates
🔎 Similar Papers
No similar papers found.