🤖 AI Summary
This work proposes LatentPre, a novel fairness-aware data preprocessing framework that addresses the significant performance degradation of existing methods under incomplete attribute spaces or missing critical variables. LatentPre is the first to incorporate identifiable latent attributes into fairness preprocessing, employing a tailored Expectation-Maximization (EM) algorithm to estimate missing sensitive or legitimate variables. By operating in an enriched attribute space, the framework effectively disentangles spurious bias from legitimate associations. This approach enables robust data calibration even under imperfect observational conditions, substantially improving the trade-off between fairness and utility across multiple real-world scenarios. Consequently, LatentPre enhances the practicality and adaptability of fair data management in settings where key attributes are unobserved or partially available.
📝 Abstract
Fair data pre-processing is a widely used strategy for mitigating bias in machine learning. A promising line of research focuses on calibrating datasets to satisfy a designed fairness policy so that sensitive attributes influence outcomes only through clearly specified legitimate causal pathways. While effective on clean and information-rich data, these methods often break down in real-world scenarios with imperfect attribute spaces, where decision-relevant factors may be deemed unusable or even missing. To address this gap, we propose LatentPre, a novel framework that enables principled and robust fair data processing in practical settings. Instead of relying solely on observed attributes, LatentPre augments the fairness policy with latent attributes that capture essential but subtle signals, enabling the framework to operate as if the attribute space were perfect. These latent attributes are strategically introduced to guarantee identifiability and are estimated using a tailored expectation-maximization paradigm. The raw data is then carefully refined to conform to this latent-augmented policy, effectively removing biased patterns while preserving justifiable ones. Extensive experiments demonstrate that LatentPre consistently achieves strong fairness-utility trade-offs across diverse scenarios, advancing practical fairness-aware data management.