🤖 AI Summary
This study addresses the limited understanding within open-source communities regarding how characteristics of a newcomer’s first task influence their long-term retention, a gap that hinders the effectiveness of task recommendation and onboarding strategies. Through a large-scale empirical analysis integrating multidimensional task and community interaction data, the work employs machine learning prediction models, SHAP-based interpretability, and causal inference techniques to demonstrate that interactive features—such as submissions by moderately experienced users, moderate discussion intensity, active project member involvement, and neutral-to-slightly-negative comment sentiment—are more strongly associated with increased newcomer retention than intrinsic task attributes. The study further identifies key combinations of features predictive of high-retention tasks, offering actionable insights for designing effective community onboarding interventions.
📝 Abstract
Sustaining newcomer participation is critical for the long-term health of open-source communities. Although prior research has explored various task recommendation approaches to help newcomers resolve their first-issue, these methods overlook how characteristics of first-issues may influence newcomers' long-term retention, limiting our understanding of whether initial success leads to sustained participation and hindering effective onboarding design. In this paper, we conduct a large-scale empirical study to examine how first-issue characteristics affect newcomer retention. We combine predictive analysis, interpretability techniques, and causal inference to estimate the causal effects of issue characteristics on retention outcomes. The prediction task supports the interpretation and shows that interaction-related characteristics exhibit stronger associations with retention than intrinsic issue attributes. The causal analysis further reveals that issues reported by moderately experienced contributors, accompanied by moderate discussion intensity and participation from project members, and neutral or slightly negative comment sentiment, have higher retention potential. These findings provide actionable insights for OSS maintainers on designing issue management practices that better support long-term newcomer retention.