🤖 AI Summary
This paper investigates the necessity of “memorization”—i.e., whether an algorithm’s output must contain a traceable footprint of training samples (m-traceability)—in stochastic convex optimization (SCO) under $ell_p$ norms. To unify the analysis of memorization and differential privacy (DP) lower bounds, the authors introduce a novel complexity measure: the *trace value*. Methodologically, the work combines $ell_p$-geometric analysis, a sparse variant of the fingerprinting lemma, and information-theoretic lower-bound techniques. Key contributions are: (1) a precise characterization of the phase transition threshold between m-traceability and excess risk; (2) for $p in [1,2]$, this threshold coincides exactly with the optimal DP excess risk, establishing a sharp dichotomy between privacy and memorization; and (3) for $p > 2$, it yields the strongest known DP learning lower bounds, partially resolving a long-standing open problem. The results provide foundational insights into the interplay among geometry, generalization, privacy, and memorization in high-dimensional SCO.
📝 Abstract
In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under $ell_p$ geometries. Informally, we say a learning algorithm memorizes $m$ samples (or is $m$-traceable) if, by analyzing its output, it is possible to identify at least $m$ of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every $pin [1,infty)$, we establish the existence of a risk threshold below which any sample-efficient learner must memorize a em{constant fraction} of its sample. For $pin [1,2]$, this threshold coincides with best risk of differentially private (DP) algorithms, i.e., above this threshold, there are algorithms that do not memorize even a single sample. This establishes a sharp dichotomy between privacy and traceability for $p in [1,2]$. For $p in (2,infty)$, this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route of proving these results, we introduce a complexity notion we term em{trace value} of a problem, which unifies privacy lower bounds and traceability results, and prove a sparse variant of the fingerprinting lemma.