Risks and Opportunities in Human-Machine Teaming in Operationalizing Machine Learning Target Variables

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of operationalizing abstract target variables in machine learning by investigating human-AI collaboration mechanisms for constructing proxy variables. We propose and empirically compare two collaborative strategies: “correlation-first” (emphasizing semantic alignment with domain intent) and “performance-first” (prioritizing immediate improvements in model metrics). A controlled user study (N=20), iterative modeling, and human factors analysis provide rigorous evaluation. Results show that the performance-first approach accelerates iteration but risks objective misalignment, whereas the correlation-first strategy—though marginally slower—better preserves domain semantics and task fidelity. This work is the first to empirically uncover the inherent trade-off between efficiency and objective alignment in proxy variable selection. It contributes actionable, evidence-based design principles for human-AI collaboration during the problem-formulation phase of predictive modeling, bridging a critical gap between domain understanding and algorithmic implementation.

Technology Category

Application Category

📝 Abstract
Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2) performance-first: machines leading the process by recommending proxies based on predictive performance. Based on a controlled user study of a proxy construction task (N = 20), we show that the performance-first strategy facilitated faster iterations and decision-making, but also biased users towards well-performing proxies that are misaligned with the application goal. Our study highlights the opportunities and risks of human-machine teaming in operationalizing machine learning target variables, yielding insights for future research to explore the opportunities and mitigate the risks.
Problem

Research questions and friction points this paper is trying to address.

Studying human-machine collaboration in defining ML target variables
Comparing relevance-first and performance-first proxy selection strategies
Analyzing risks and opportunities in operationalizing abstract ML constructs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-machine teaming accelerates proxy variable iterations
Performance-first strategy speeds decision-making but causes bias
Relevance-first approach preserves human judgment in target selection
🔎 Similar Papers
M
Mengtian Guo
University of North Carolina at Chapel Hill, USA
David Gotz
David Gotz
University of North Carolina at Chapel Hill
Visual AnalyticsMedical InformaticsInformation VisualizationData Science
Y
Yue Wang
University of North Carolina at Chapel Hill, USA