🤖 AI Summary
This paper investigates the root causes of performance degradation in dual-tower models for click-log-driven unbiased learning-to-rank, focusing on confounding effects and insufficient model identifiability. Theoretically, we establish the first necessary and sufficient condition for parameter identifiability in dual-tower models: either document positions must be exchangeable or feature distributions must exhibit sufficient overlap; we further show that policy misspecification amplifies position bias. Methodologically, we propose a causal-inference-based sample weighting scheme that explicitly corrects for confounding introduced by the logging policy. Experiments demonstrate that this weighting strategy substantially mitigates performance degradation, yielding stable NDCG@10 improvements of 2.3–4.1% across multiple industrial datasets. Our core contributions are (i) a formal theoretical framework characterizing identifiability in dual-tower ranking models, and (ii) a practical, deployable bias-mitigation solution grounded in causal principles.
📝 Abstract
Additive two-tower models are popular learning-to-rank methods for handling biased user feedback in industry settings. Recent studies, however, report a concerning phenomenon: training two-tower models on clicks collected by well-performing production systems leads to decreased ranking performance. This paper investigates two recent explanations for this observation: confounding effects from logging policies and model identifiability issues. We theoretically analyze the identifiability conditions of two-tower models, showing that either document swaps across positions or overlapping feature distributions are required to recover model parameters from clicks. We also investigate the effect of logging policies on two-tower models, finding that they introduce no bias when models perfectly capture user behavior. However, logging policies can amplify biases when models imperfectly capture user behavior, particularly when prediction errors correlate with document placement across positions. We propose a sample weighting technique to mitigate these effects and provide actionable insights for researchers and practitioners using two-tower models.