Lower Bounds for Public-Private Learning under Distribution Shift

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This paper investigates fundamental performance limits of differentially private learning under distribution shift when jointly leveraging public and private data. Addressing the setting where public and private data follow mismatched distributions, we derive the first sample complexity lower bounds for distributionally shifted scenarios—specifically for Gaussian mean estimation and Gaussian linear regression—using tools from statistical learning theory and information theory. Key results: (i) Under small distribution shift, accurate private parameter estimation requires sufficiently large sample size from at least one data source; (ii) Under large shift, public data provides no asymptotic benefit to private learning performance. Our analysis systematically characterizes the critical conditions governing the complementarity between public and private data, revealing the theoretical limits of data fusion in privacy-preserving machine learning. These findings establish foundational guarantees for the feasibility of differentially private learning in realistic non-i.i.d. settings.

Technology Category

Application Category

📝 Abstract

The most effective differentially private machine learning algorithms in practice rely on an additional source of purportedly public data. This paradigm is most interesting when the two sources combine to be more than the sum of their parts. However, there are settings such as mean estimation where we have strong lower bounds, showing that when the two data sources have the same distribution, there is no complementary value to combining the two data sources. In this work we extend the known lower bounds for public-private learning to setting where the two data sources exhibit significant distribution shift. Our results apply to both Gaussian mean estimation where the two distributions have different means, and to Gaussian linear regression where the two distributions exhibit parameter shift. We find that when the shift is small (relative to the desired accuracy), either public or private data must be sufficiently abundant to estimate the private parameter. Conversely, when the shift is large, public data provides no benefit.

Problem

Research questions and friction points this paper is trying to address.

Extends lower bounds to public-private learning with distribution shift

Analyzes Gaussian mean estimation under different distribution means

Examines Gaussian linear regression with parameter shift effects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends lower bounds to distribution shift settings

Analyzes Gaussian mean estimation with different means

Examines Gaussian linear regression parameter shift

🔎 Similar Papers

On the Benefits of Public Representations for Private Transfer Learning under Distribution Shift