Friend or Foe: Delegating to an AI Whose Alignment is Unknown

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the information disclosure problem for decision-makers under uncertain AI alignment—specifically, in clinical decision support, where designers must balance the benefit of disclosing patient features to improve treatment prediction accuracy against the risk of AI misuse due to objective misalignment. Method: We propose an optimal information design framework that strategically selects which patient features to disclose. Crucially, it identifies and selectively reveals sensitive features that identify rare, high-need subpopulations, while aggregating other patients into coarse categories. The framework formally integrates Bayesian decision theory and mechanism design to model the interplay between the designer’s uncertain beliefs about AI alignment and the disclosure policy. Contribution/Results: We provide theoretical guarantees and empirical validation showing that our approach significantly improves decision efficacy and reduces mismatch risk. It yields an interpretable, controllable information-sharing paradigm for trustworthy human–AI collaborative decision-making under alignment uncertainty.

Technology Category

Application Category

📝 Abstract
AI systems have the potential to improve decision-making, but decision makers face the risk that the AI may be misaligned with their objectives. We study this problem in the context of a treatment decision, where a designer decides which patient attributes to reveal to an AI before receiving a prediction of the patient's need for treatment. Providing the AI with more information increases the benefits of an aligned AI but also amplifies the harm from a misaligned one. We characterize how the designer should select attributes to balance these competing forces, depending on their beliefs about the AI's reliability. We show that the designer should optimally disclose attributes that identify emph{rare} segments of the population in which the need for treatment is high, and pool the remaining patients.
Problem

Research questions and friction points this paper is trying to address.

Balancing information disclosure risks with AI alignment uncertainty
Optimizing attribute selection for AI-assisted treatment decisions
Managing rare population segments under potential AI misalignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute selection based on AI reliability beliefs
Disclosing rare high-treatment-need population segments
Pooling remaining patients to balance risks