Measurement as Bricolage: Examining How Data Scientists Construct Target Variables for Predictive Modeling Tasks

📅 2025-07-03

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Data scientists frequently lack systematic guidance when operationalizing ambiguous concepts (e.g., “writing authenticity,” “medical need”) into model-ready proxy target variables. To address this, we conducted semi-structured interviews with 15 data scientists across education and healthcare domains, followed by cross-domain thematic coding. We propose the “assemblage metrics” framework, identifying five core design criteria: validity, simplicity, predictiveness, portability, and resource efficiency. Our analysis reveals an iterative, problem-reconstruction–driven practice in which target variables are dynamically negotiated through trade-offs among these criteria. This work offers the first systematic characterization of such trade-offs in proxy target construction. It contributes a theoretically grounded framework and methodological tools for HCI, CSCW, and machine learning communities to support principled, transparent, and trustworthy predictive modeling—bridging conceptual abstraction with operationalizable measurement.

Technology Category

Application Category

📝 Abstract

Data scientists often formulate predictive modeling tasks involving fuzzy, hard-to-define concepts, such as the "authenticity" of student writing or the "healthcare need" of a patient. Yet the process by which data scientists translate fuzzy concepts into a concrete, proxy target variable remains poorly understood. We interview fifteen data scientists in education (N=8) and healthcare (N=7) to understand how they construct target variables for predictive modeling tasks. Our findings suggest that data scientists construct target variables through a bricolage process, involving iterative negotiation between high-level measurement objectives and low-level practical constraints. Data scientists attempt to satisfy five major criteria for a target variable through bricolage: validity, simplicity, predictability, portability, and resource requirements. To achieve this, data scientists adaptively use problem (re)formulation strategies, such as swapping out one candidate target variable for another when the first fails to meet certain criteria (e.g., predictability), or composing multiple outcomes into a single target variable to capture a more holistic set of modeling objectives. Based on our findings, we present opportunities for future HCI, CSCW, and ML research to better support the art and science of target variable construction.

Problem

Research questions and friction points this paper is trying to address.

How data scientists define fuzzy concepts for predictive modeling

Understanding the bricolage process in target variable construction

Balancing validity, simplicity, and practicality in variable selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bricolage process for target variable construction

Iterative negotiation between objectives and constraints

Adaptive problem reformulation strategies

🔎 Similar Papers

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective