A Test of Time: Predicting the Sustainable Success of Online Collaboration in Wikipedia

📅 2024-10-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of sustaining long-term quality in online collaborative projects (e.g., Wikipedia) by introducing “sustainable success”—a novel metric quantifying an article’s ability to maintain high quality after receiving a formal quality certification. Leveraging over 40,000 Wikipedia articles, we construct SustainPedia, a benchmark dataset integrating 300+ heterogeneous features—including editing history, contributor expertise, and team structural dynamics—and train machine learning models to predict sustainability. Our analysis uncovers a counterintuitive finding: articles certified later exhibit significantly higher sustainability. Contributor expertise emerges as the strongest predictive factor. The best-performing model achieves an average AU-ROC of 0.88. This work establishes the critical roles of temporal dynamics and individual expertise in sustaining collective intelligence, with implications extending to open-source software development and digital activism.

Technology Category

Application Category

📝 Abstract
The Internet has significantly expanded the potential for global collaboration, allowing millions of users to contribute to collective projects like Wikipedia. While prior work has assessed the success of online collaborations, most approaches are time-agnostic, evaluating success without considering its longevity. Research on the factors that ensure the long-term preservation of high-quality standards in online collaboration is scarce. In this study, we address this gap. We propose a novel metric, `Sustainable Success,' which measures the ability of collaborative efforts to maintain their quality over time. Using Wikipedia as a case study, we introduce the SustainPedia dataset, which compiles data from over 40K Wikipedia articles, including each article's sustainable success label and more than 300 explanatory features such as edit history, user experience, and team composition. Using this dataset, we develop machine learning models to predict the sustainable success of Wikipedia articles. Our best-performing model achieves a high AU-ROC score of 0.88 on average. Our analysis reveals important insights. For example, we find that the longer an article takes to be recognized as high-quality, the more likely it is to maintain that status over time (i.e., be sustainable). Additionally, user experience emerged as the most critical predictor of sustainability. Our analysis provides insights into broader collective actions beyond Wikipedia (e.g., online activism, crowdsourced open-source software), where the same social dynamics that drive success on Wikipedia might play a role. We make all data and code used for this study publicly available for further research.
Problem

Research questions and friction points this paper is trying to address.

Predicting long-term quality maintenance in online collaborations.
Identifying factors ensuring sustainable success in Wikipedia articles.
Developing machine learning models for sustainable collaboration prediction.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed `Sustainable Success` metric for quality longevity
Created SustainPedia dataset with 40K Wikipedia articles
Built machine learning models predicting sustainable success