Position: Curvature Matrices Should Be Democratized via Linear Operators

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

In machine learning, Hessian and other curvature matrices suffer from high computational complexity, lack of unified representation, and poor scalability, severely hindering applications such as second-order optimization, uncertainty quantification, and model compression. To address this, we propose a unified abstraction paradigm based on linear operators—specifically, matrix-vector multiplication interfaces—and systematically develop the first scalable, user-friendly curvature matrix representation framework. Building upon this paradigm, we introduce *curvlinops*, an open-source PyTorch library that supports automatic differentiation, incorporates structural priors (e.g., sparsity, Kronecker factorization), and maintains compatibility with major deep learning frameworks. Experiments demonstrate that our approach drastically simplifies implementation across curvature-driven tasks, enabling efficient, memory-bounded, and scalable computation even on models with tens of billions of parameters. This advances the practical adoption and democratization of curvature-based methods in large-scale deep learning.

Technology Category

Application Category

📝 Abstract

Structured large matrices are prevalent in machine learning. A particularly important class is curvature matrices like the Hessian, which are central to understanding the loss landscape of neural nets (NNs), and enable second-order optimization, uncertainty quantification, model pruning, data attribution, and more. However, curvature computations can be challenging due to the complexity of automatic differentiation, and the variety and structural assumptions of curvature proxies, like sparsity and Kronecker factorization. In this position paper, we argue that linear operators -- an interface for performing matrix-vector products -- provide a general, scalable, and user-friendly abstraction to handle curvature matrices. To support this position, we developed $ extit{curvlinops}$, a library that provides curvature matrices through a unified linear operator interface. We demonstrate with $ extit{curvlinops}$ how this interface can hide complexity, simplify applications, be extensible and interoperable with other libraries, and scale to large NNs.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning

Matrix Computations

Curvature Matrices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Operator Methods

Neural Network Curvature

curvlinops Library

🔎 Similar Papers

Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures