Position: Curvature Matrices Should Be Democratized via Linear Operators

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In machine learning, Hessian and other curvature matrices suffer from high computational complexity, lack of unified representation, and poor scalability, severely hindering applications such as second-order optimization, uncertainty quantification, and model compression. To address this, we propose a unified abstraction paradigm based on linear operators—specifically, matrix-vector multiplication interfaces—and systematically develop the first scalable, user-friendly curvature matrix representation framework. Building upon this paradigm, we introduce *curvlinops*, an open-source PyTorch library that supports automatic differentiation, incorporates structural priors (e.g., sparsity, Kronecker factorization), and maintains compatibility with major deep learning frameworks. Experiments demonstrate that our approach drastically simplifies implementation across curvature-driven tasks, enabling efficient, memory-bounded, and scalable computation even on models with tens of billions of parameters. This advances the practical adoption and democratization of curvature-based methods in large-scale deep learning.

Technology Category

Application Category

📝 Abstract
Structured large matrices are prevalent in machine learning. A particularly important class is curvature matrices like the Hessian, which are central to understanding the loss landscape of neural nets (NNs), and enable second-order optimization, uncertainty quantification, model pruning, data attribution, and more. However, curvature computations can be challenging due to the complexity of automatic differentiation, and the variety and structural assumptions of curvature proxies, like sparsity and Kronecker factorization. In this position paper, we argue that linear operators -- an interface for performing matrix-vector products -- provide a general, scalable, and user-friendly abstraction to handle curvature matrices. To support this position, we developed $ extit{curvlinops}$, a library that provides curvature matrices through a unified linear operator interface. We demonstrate with $ extit{curvlinops}$ how this interface can hide complexity, simplify applications, be extensible and interoperable with other libraries, and scale to large NNs.
Problem

Research questions and friction points this paper is trying to address.

Machine Learning
Matrix Computations
Curvature Matrices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Operator Methods
Neural Network Curvature
curvlinops Library
🔎 Similar Papers
No similar papers found.