A Unified Framework for Lifted Training and Inversion Approaches

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inherent limitations of backpropagation—including vanishing/exploding gradients, poor handling of nonsmooth activations, and limited parallelizability—this paper proposes a unified training framework based on constrained optimization. It formulates deep neural network training as a high-dimensional constrained optimization problem, relaxes constraints via penalty terms, and improves manifold conditioning using Bregman distances. The framework uniquely unifies proximal-type nonsmooth activations, distributed optimization, and joint forward-inverse problem solving, integrating auxiliary coordinate methods, Fenchel duality, and block-coordinate descent, while supporting acceleration and adaptive stochastic/deterministic updates. Experimental validation on standard imaging tasks demonstrates that, compared to conventional backpropagation, the method achieves significantly improved convergence speed, numerical stability, and robustness—particularly for networks employing proximal activations.

Technology Category

Application Category

📝 Abstract
The training of deep neural networks predominantly relies on a combination of gradient-based optimisation and back-propagation for the computation of the gradient. While incredibly successful, this approach faces challenges such as vanishing or exploding gradients, difficulties with non-smooth activations, and an inherently sequential structure that limits parallelisation. Lifted training methods offer an alternative by reformulating the nested optimisation problem into a higher-dimensional, constrained optimisation problem where the constraints are no longer enforced directly but penalised with penalty terms. This chapter introduces a unified framework that encapsulates various lifted training strategies, including the Method of Auxiliary Coordinates, Fenchel Lifted Networks, and Lifted Bregman Training, and demonstrates how diverse architectures, such as Multi-Layer Perceptrons, Residual Neural Networks, and Proximal Neural Networks fit within this structure. By leveraging tools from convex optimisation, particularly Bregman distances, the framework facilitates distributed optimisation, accommodates non-differentiable proximal activations, and can improve the conditioning of the training landscape. We discuss the implementation of these methods using block-coordinate descent strategies, including deterministic implementations enhanced by accelerated and adaptive optimisation techniques, as well as implicit stochastic gradient methods. Furthermore, we explore the application of this framework to inverse problems, detailing methodologies for both the training of specialised networks (e.g., unrolled architectures) and the stable inversion of pre-trained networks. Numerical results on standard imaging tasks validate the effectiveness and stability of the lifted Bregman approach compared to conventional training, particularly for architectures employing proximal activations.
Problem

Research questions and friction points this paper is trying to address.

Addresses vanishing and exploding gradients in deep networks
Enables distributed optimization for neural network training
Handles non-differentiable activations in inverse problem solving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified framework combines various lifted training strategies
Uses Bregman distances for distributed optimization and conditioning
Applies block-coordinate descent for training and inversion problems
X
Xiaoyu Wang
Heriot-Watt University, Edinburgh, UK
A
Alexandra Valavanis
Queen Mary University of London, London, UK
A
Azhir Mahmood
University College London, London, UK
Andreas Mang
Andreas Mang
Associate Professor, Department of Mathematics, University of Houston
Scientific ComputingNumerical OptimizationInverse ProblemsOptimal ControlData Science
Martin Benning
Martin Benning
Professor of Inverse Problems, University College London
Inverse ProblemsOptimisationMachine LearningImaging
A
Audrey Repetti
Heriot-Watt University, Edinburgh, UK