Off-Policy Evaluation for Sequential Persuasion Process with Unobserved Confounding

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional Bayesian persuasion frameworks fail to account for unobserved confounders that distort beliefs and decisions in sender–receiver interactions. Method: We model sequential persuasion as a partially observable Markov decision process (POMDP), where the sender observes only receiver actions—not their latent beliefs or confounding states—and derive the optimal signaling policy. We establish, for the first time, theoretical equivalence between optimal POMDP observation policies and optimal signaling policies. Furthermore, we integrate proximal causal learning to enable off-policy evaluation using purely historical observational data. Contribution: Our work theoretically breaks the reliance of Bayesian persuasion on the no-unmeasured-confounding assumption. Practically, it enables online-experiment-free evaluation of new signaling strategies, substantially reducing deployment costs while preserving statistical validity.

Technology Category

Application Category

📝 Abstract
In this paper, we expand the Bayesian persuasion framework to account for unobserved confounding variables in sender-receiver interactions. While traditional models assume that belief updates follow Bayesian principles, real-world scenarios often involve hidden variables that impact the receiver's belief formation and decision-making. We conceptualize this as a sequential decision-making problem, where the sender and receiver interact over multiple rounds. In each round, the sender communicates with the receiver, who also interacts with the environment. Crucially, the receiver's belief update is affected by an unobserved confounding variable. By reformulating this scenario as a Partially Observable Markov Decision Process (POMDP), we capture the sender's incomplete information regarding both the dynamics of the receiver's beliefs and the unobserved confounder. We prove that finding an optimal observation-based policy in this POMDP is equivalent to solving for an optimal signaling strategy in the original persuasion framework. Furthermore, we demonstrate how this reformulation facilitates the application of proximal learning for off-policy evaluation in the persuasion process. This advancement enables the sender to evaluate alternative signaling strategies using only observational data from a behavioral policy, thus eliminating the necessity for costly new experiments.
Problem

Research questions and friction points this paper is trying to address.

Handles unobserved confounding in sender-receiver persuasion
Models sequential persuasion as POMDP for belief dynamics
Enables off-policy evaluation using observational data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian persuasion with unobserved confounding variables
POMDP reformulation for sequential persuasion
Proximal learning for off-policy evaluation
🔎 Similar Papers
No similar papers found.