Statistical Learning from Attribution Sets

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of conversion prediction in online advertising under strict privacy constraints, where only a candidate set of clicks (attribution set) associated with each conversion is observable, without explicit click-to-conversion links. The authors propose the first unbiased loss estimation method tailored to this attribution-set setting, constructing an unbiased estimator of the population risk by leveraging both the structure of attribution sets and prior distributional knowledge. Learning is then performed via empirical risk minimization based on this estimator. The approach is robust to inaccuracies in prior estimation and exhibits improved performance as the informativeness of the prior increases. Experimental results demonstrate that the method significantly outperforms existing heuristic strategies on standard benchmarks, with particularly pronounced gains when attribution sets are large or highly overlapping.

Technology Category

Application Category

📝 Abstract

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

Problem

Research questions and friction points this paper is trying to address.

attribution sets

conversion prediction

privacy constraints

unlabeled learning

advertising

Innovation

Methods, ideas, or system contributions that make the work stand out.

attribution sets

unbiased estimator

privacy-preserving learning