🤖 AI Summary
This work addresses the challenge of effectively integrating human intuition and domain expertise into machine learning training to optimize latent space representations and assess their impact on model bias. To this end, we propose HILL, an interactive framework that introduces the novel “human-in-the-latent-loop” paradigm: users directly manipulate latent representations via a visual interface, and the system formalizes these edits as differentiable teacher signals for knowledge distillation, enabling end-to-end latent space remapping. Experiments demonstrate that HILL improves model performance while preserving generalization. A user study further provides the first empirical evidence that human intervention can inadvertently introduce new forms of bias. This work establishes a new paradigm for trustworthy human-AI collaborative learning, contributes key technical innovations—including differentiable human feedback modeling and latent-space distillation—and delivers critical insights into the risks of unexamined human guidance in latent-space optimization.
📝 Abstract
Latent space representations are critical for understanding and improving the behavior of machine learning models, yet they often remain obscure and intricate. Understanding and exploring the latent space has the potential to contribute valuable human intuition and expertise about respective domains. In this work, we present HILL, an interactive framework allowing users to incorporate human intuition into the model training by interactively reshaping latent space representations. The modifications are infused into the model training loop via a novel approach inspired by knowledge distillation, treating the user's modifications as a teacher to guide the model in reshaping its intrinsic latent representation. The process allows the model to converge more effectively and overcome inefficiencies, as well as provide beneficial insights to the user. We evaluated HILL in a user study tasking participants to train an optimal model, closely observing the employed strategies. The results demonstrated that human-guided latent space modifications enhance model performance while maintaining generalization, yet also revealing the risks of including user biases. Our work introduces a novel human-AI interaction paradigm that infuses human intuition into model training and critically examines the impact of human intervention on training strategies and potential biases.