🤖 AI Summary
This work addresses the challenge of conversion prediction in online advertising under strict privacy constraints, where only a candidate set of clicks (attribution set) associated with each conversion is observable, without explicit click-to-conversion links. The authors propose the first unbiased loss estimation method tailored to this attribution-set setting, constructing an unbiased estimator of the population risk by leveraging both the structure of attribution sets and prior distributional knowledge. Learning is then performed via empirical risk minimization based on this estimator. The approach is robust to inaccuracies in prior estimation and exhibits improved performance as the informativeness of the prior increases. Experimental results demonstrate that the method significantly outperforms existing heuristic strategies on standard benchmarks, with particularly pronounced gains when attribution sets are large or highly overlapping.
📝 Abstract
We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.