Practical Performative Policy Learning with Strategic Agents

📅 2024-12-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This paper addresses endogenous distributional shift arising when strategic agents actively manipulate features to optimize outcomes in strategic settings. Existing approaches rely on strong parametric assumptions—such as explicit micro-level utility models and macro-level distribution mappings—limiting scalability. We formulate executability-aware strategic learning as a weak-assumption causal inference task, uncovering low-dimensional structure underlying distributional shift. To circumvent stringent parametric constraints on both utility and distribution, we introduce differentiable instrumental variables that replace high-dimensional distribution mappings. Integrating bounded rationality modeling, differentiable classifier design, gradient-driven strategy optimization, and finite manipulation pattern characterization, our method achieves high sample efficiency in high-dimensional settings. It substantially outperforms bandit-feedback and zero-order optimization baselines and provides theoretical convergence guarantees.

Technology Category

Application Category

📝 Abstract

This paper studies the performative policy learning problem, where agents adjust their features in response to a released policy to improve their potential outcomes, inducing an endogenous distribution shift. There has been growing interest in training machine learning models in strategic environments, including strategic classification and performative prediction. However, existing approaches often rely on restrictive parametric assumptions: micro-level utility models in strategic classification and macro-level data distribution maps in performative prediction, severely limiting scalability and generalizability. We approach this problem as a complex causal inference task, relaxing parametric assumptions on both micro-level agent behavior and macro-level data distribution. Leveraging bounded rationality, we uncover a practical low-dimensional structure in distribution shifts and construct an effective mediator in the causal path from the deployed model to the shifted data. We then propose a gradient-based policy optimization algorithm with a differentiable classifier as a substitute for the high-dimensional distribution map. Our algorithm efficiently utilizes batch feedback and limited manipulation patterns. Our approach achieves high sample efficiency compared to methods reliant on bandit feedback or zero-order optimization. We also provide theoretical guarantees for algorithmic convergence. Extensive and challenging experiments on high-dimensional settings demonstrate our method's practical efficacy.

Problem

Research questions and friction points this paper is trying to address.

Learning with strategic agents

Handling endogenous distribution shifts

Relaxing parametric assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal inference without parametric assumptions

Low-dimensional structure discovery

Gradient-based policy optimization algorithm

🔎 Similar Papers

Efficient Multi-Policy Evaluation for Reinforcement Learning