On the Dichotomy Between Privacy and Traceability in $ell_p$ Stochastic Convex Optimization

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the necessity of “memorization”—i.e., whether an algorithm’s output must contain a traceable footprint of training samples (m-traceability)—in stochastic convex optimization (SCO) under $ell_p$ norms. To unify the analysis of memorization and differential privacy (DP) lower bounds, the authors introduce a novel complexity measure: the *trace value*. Methodologically, the work combines $ell_p$-geometric analysis, a sparse variant of the fingerprinting lemma, and information-theoretic lower-bound techniques. Key contributions are: (1) a precise characterization of the phase transition threshold between m-traceability and excess risk; (2) for $p in [1,2]$, this threshold coincides exactly with the optimal DP excess risk, establishing a sharp dichotomy between privacy and memorization; and (3) for $p > 2$, it yields the strongest known DP learning lower bounds, partially resolving a long-standing open problem. The results provide foundational insights into the interplay among geometry, generalization, privacy, and memorization in high-dimensional SCO.

Technology Category

Application Category

📝 Abstract
In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under $ell_p$ geometries. Informally, we say a learning algorithm memorizes $m$ samples (or is $m$-traceable) if, by analyzing its output, it is possible to identify at least $m$ of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every $pin [1,infty)$, we establish the existence of a risk threshold below which any sample-efficient learner must memorize a em{constant fraction} of its sample. For $pin [1,2]$, this threshold coincides with best risk of differentially private (DP) algorithms, i.e., above this threshold, there are algorithms that do not memorize even a single sample. This establishes a sharp dichotomy between privacy and traceability for $p in [1,2]$. For $p in (2,infty)$, this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route of proving these results, we introduce a complexity notion we term em{trace value} of a problem, which unifies privacy lower bounds and traceability results, and prove a sparse variant of the fingerprinting lemma.
Problem

Research questions and friction points this paper is trying to address.

Explores memorization in stochastic convex optimization.
Identifies tradeoff between traceability and excess risk.
Establishes privacy and traceability dichotomy for $ell_p$ geometries.
Innovation

Methods, ideas, or system contributions that make the work stand out.

memorization in SCO
privacy-traceability tradeoff
trace value concept
🔎 Similar Papers
No similar papers found.
S
S. Voitovych
Institute for Data, Systems, and Society, Massachusetts Institute of Technology
M
Mahdi Haghifam
Khoury College of Computer Sciences, Northeastern University
Idan Attias
Idan Attias
Postdoctoral Researcher, IDEAL Institute (UIC and TTIC)
Machine learningLearning theoryTheoretical computer science
G
G. Dziugaite
Google DeepMind
R
Roi Livni
Department of Electrical Engineering, Tel Aviv University
Daniel M. Roy
Daniel M. Roy
Research Director, Vector Institute; Prof., U. Toronto (Statistics, CS)
Machine learningTrustworthy AIMathematical StatisticsLearning TheoryTheoretical CS