Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper addresses the challenge of operationalizing abstract target variables in machine learning by investigating human-AI collaboration mechanisms for constructing proxy variables. We propose and empirically compare two collaborative strategies: “correlation-first” (emphasizing semantic alignment with domain intent) and “performance-first” (prioritizing immediate improvements in model metrics). A controlled user study (N=20), iterative modeling, and human factors analysis provide rigorous evaluation. Results show that the performance-first approach accelerates iteration but risks objective misalignment, whereas the correlation-first strategy—though marginally slower—better preserves domain semantics and task fidelity. This work is the first to empirically uncover the inherent trade-off between efficiency and objective alignment in proxy variable selection. It contributes actionable, evidence-based design principles for human-AI collaboration during the problem-formulation phase of predictive modeling, bridging a critical gap between domain understanding and algorithmic implementation.

Technology Category

Application Category

📝 Abstract

Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.

Problem

Research questions and friction points this paper is trying to address.

Studying human-machine collaboration in defining ML target variables

Comparing relevance-first and performance-first proxy selection strategies

Analyzing risks and opportunities in operationalizing abstract ML constructs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-machine teaming accelerates proxy variable iterations

Performance-first strategy speeds decision-making but causes bias

Relevance-first approach preserves human judgment in target selection

🔎 Similar Papers

A Multivocal Review of MLOps Practices, Challenges and Open Issues