🤖 AI Summary
This work proposes a hybrid quantum photonic–classical framework that leverages hardware-native randomness to enhance knowledge distillation efficiency and reduce student network complexity. For the first time, intrinsic randomness from programmable photonic circuits is integrated into knowledge distillation, guiding the student model via dictionary-based convolution and a gradient-free photonic parameter update mechanism. Coupled with exponential moving average feature smoothing, the approach achieves an excellent trade-off between model compression and accuracy on MNIST, Fashion-MNIST, and CIFAR-10 benchmarks—nearly matching teacher performance even under aggressive compression. Moreover, the observed performance degradation due to limited sampling follows shot-noise scaling, confirming the physical plausibility and practical viability of the proposed method.
📝 Abstract
Photonic quantum processors naturally produce intrinsically stochastic measurement outcomes, offering a hardware-native source of structured randomness that can be exploited during machine-learning training. Here we introduce Photonic Quantum-Enhanced Knowledge Distillation (PQKD), a hybrid quantum photonic--classical framework in which a programmable photonic circuit generates a compact conditioning signal that constrains and guides a parameter-efficient student network during distillation from a high-capacity teacher. PQKD replaces fully trainable convolutional kernels with dictionary convolutions: each layer learns only a small set of shared spatial basis filters, while sample-dependent channel-mixing weights are derived from shot-limited photonic features and mapped through a fixed linear transform. Training alternates between standard gradient-based optimisation of the student and sampling-robust, gradient-free updates of photonic parameters, avoiding differentiation through photonic hardware. Across MNIST, Fashion-MNIST and CIFAR-10, PQKD traces a controllable compression--accuracy frontier, remaining close to teacher performance on simpler benchmarks under aggressive convolutional compression. Performance degrades predictably with finite sampling, consistent with shot-noise scaling, and exponential moving-average feature smoothing suppresses high-frequency shot-noise fluctuations, extending the practical operating regime at moderate shot budgets.