On the Dichotomy Between Privacy and Traceability in $ell_p$ Stochastic Convex Optimization

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This paper investigates the necessity of “memorization”—i.e., whether an algorithm’s output must contain a traceable footprint of training samples (m-traceability)—in stochastic convex optimization (SCO) under $ell_p$ norms. To unify the analysis of memorization and differential privacy (DP) lower bounds, the authors introduce a novel complexity measure: the *trace value*. Methodologically, the work combines $ell_p$-geometric analysis, a sparse variant of the fingerprinting lemma, and information-theoretic lower-bound techniques. Key contributions are: (1) a precise characterization of the phase transition threshold between m-traceability and excess risk; (2) for $p in [1,2]$, this threshold coincides exactly with the optimal DP excess risk, establishing a sharp dichotomy between privacy and memorization; and (3) for $p > 2$, it yields the strongest known DP learning lower bounds, partially resolving a long-standing open problem. The results provide foundational insights into the interplay among geometry, generalization, privacy, and memorization in high-dimensional SCO.

Technology Category

Application Category

📝 Abstract

In this paper, we investigate the necessity of memorization in stochastic convex optimization (SCO) under $ell_p$ geometries. Informally, we say a learning algorithm memorizes $m$ samples (or is $m$-traceable) if, by analyzing its output, it is possible to identify at least $m$ of its training samples. Our main results uncover a fundamental tradeoff between traceability and excess risk in SCO. For every $pin [1,infty)$, we establish the existence of a risk threshold below which any sample-efficient learner must memorize a em{constant fraction} of its sample. For $pin [1,2]$, this threshold coincides with best risk of differentially private (DP) algorithms, i.e., above this threshold, there are algorithms that do not memorize even a single sample. This establishes a sharp dichotomy between privacy and traceability for $p in [1,2]$. For $p in (2,infty)$, this threshold instead gives novel lower bounds for DP learning, partially closing an open problem in this setup. En route of proving these results, we introduce a complexity notion we term em{trace value} of a problem, which unifies privacy lower bounds and traceability results, and prove a sparse variant of the fingerprinting lemma.

Problem

Research questions and friction points this paper is trying to address.

Explores memorization in stochastic convex optimization.

Identifies tradeoff between traceability and excess risk.

Establishes privacy and traceability dichotomy for $ell_p$ geometries.

Innovation

Methods, ideas, or system contributions that make the work stand out.

memorization in SCO

privacy-traceability tradeoff

trace value concept

🔎 Similar Papers

No similar papers found.