🤖 AI Summary
This study addresses a critical gap in proactive AI writing assistants: the reliance on users’ self-reported preferences despite a lack of empirical understanding of the behavioral drivers underlying their actual needs. Through a factorial contextual experiment involving 750 paired comparisons from 50 participants, the work reveals a significant “perception–behavior gap”—users’ stated preferences often diverge from, and sometimes directly contradict, their observed behaviors. Leveraging behavioral data to inform assistance decisions yields a system accuracy of 61.3%, significantly outperforming both preference-based approaches (57.7%) and naive baselines. These findings challenge the prevailing design paradigm rooted in user introspection and offer a new pathway toward more effective proactive AI interactions grounded in empirical behavioral evidence.
📝 Abstract
Proactive AI writing assistants need to predict when users want drafting help, yet we lack empirical understanding of what drives preferences. Through a factorial vignette study with 50 participants making 750 pairwise comparisons, we find compositional effort dominates decisions ($\rho = 0.597$) while urgency shows no predictive power ($\rho \approx 0$). More critically, users exhibit a striking perception-behavior gap: they rank urgency first in self-reports despite it being the weakest behavioral driver, representing a complete preference inversion. This misalignment has measurable consequences. Systems designed from users'stated preferences achieve only 57.7\% accuracy, underperforming even naive baselines, while systems using behavioral patterns reach significantly higher 61.3\% ($p<0.05$). These findings demonstrate that relying on user introspection for system design actively misleads optimization, with direct implications for proactive natural language generation (NLG) systems.