🤖 AI Summary
In resource allocation, agents may strategically misreport their attributes to influence outcomes, complicating the distinction between genuine behavioral change and strategic manipulation.
Method: We propose the first causally identifiable definition and estimation of the misreporting rate—leveraging its defining asymmetry: misreports lack causal downstream effects. Our approach integrates do-calculus, counterfactual modeling, and comparative estimation using manipulated versus non-manipulated data, supported by asymptotic variance analysis and semi-synthetic validation.
Contribution/Results: We establish theoretical consistency and optimal convergence rates for our estimator. Empirical evaluation on real Medicare data and semi-synthetic benchmarks demonstrates a 23% reduction in misclassification error over baselines, significantly improving quantification accuracy of strategic behavior. This work introduces the first causally grounded, identifiable framework for measuring misreporting—providing both theoretical foundations and practical tools for fair, robust, strategy-aware resource allocation.
📝 Abstract
In settings where ML models are used to inform the allocation of resources, agents affected by the allocation decisions might have an incentive to strategically change their features to secure better outcomes. While prior work has studied strategic responses broadly, disentangling misreporting from genuine modification remains a fundamental challenge. In this paper, we propose a causally-motivated approach to identify and quantify how much an agent misreports on average by distinguishing deceptive changes in their features from genuine modification. Our key insight is that, unlike genuine modification, misreported features do not causally affect downstream variables (i.e., causal descendants). We exploit this asymmetry by comparing the causal effect of misreported features on their causal descendants as derived from manipulated datasets against those from unmanipulated datasets. We formally prove identifiability of the misreporting rate and characterize the variance of our estimator. We empirically validate our theoretical results using a semi-synthetic and real Medicare dataset with misreported data, demonstrating that our approach can be employed to identify misreporting in real-world scenarios.