🤖 AI Summary
This study investigates the mechanisms sustaining cross-community collaboration in open-source software ecosystems and their impact on community survival. Drawing on contribution data from 464 cybersecurity projects and 11,372 contributors between 2001 and 2022, the authors construct a contributor–repository bipartite graph and introduce a “recognition/repetition” relational framework. Empirical analysis combines Louvain community detection, survival analysis, and a residualized hazard model. Findings reveal that cross-community collaboration is highly concentrated within an extremely thin “carrier layer,” with boundary friction diminishing significantly as relational depth increases. The top 50 cross-community contributors account for 54% of such pull requests, whose acceptance rate rises from 42% to 87% and median response time drops from 147 to 49 hours. Despite the stability of this carrier layer, communities formed later face elevated extinction risks, exhibiting pronounced cohort-structured survival patterns.
📝 Abstract
We measure cross-boundary collaboration in an open-source software (OSS) ecosystem by reconstructing the bipartite contributor-repository graph of 464 cybersecurity projects and 11,372 contributors active over October 2001-May 2022 (Rawsec Cybersecurity Inventory). Louvain community detection identifies 163 non-singleton communities; per-community contributor count scales superlinearly with repository count (n_contributors ~ n_repos^1.4), and community formation follows a logistic trajectory saturating around 2018. Three patterns support a recognition/repeat-relationship account of cross-boundary work. First, cross-community work concentrates in a thin carrier layer: only nine canonical humans span seven or more communities at the commit level, authoring 14% of 4,015 inter-community merged pull requests; the top 50 cross-community contributors produce 54%. Second, boundary friction is a recognition cost, not a fixed boundary property: inter-community pull-request acceptance rises from 42% at breadth k=1 to 87% at k=5-9, with median latency compressing from 147 h to 49 h. Third, community survival is cohort-structured: per-cohort residualisation hazard rises an order of magnitude between pre-2010 and 2018 cohorts, and external community reach predicts survival mainly through size, leaving late cohorts under-served despite a stable carrier layer. The corpus predates mainstream LLM coding assistants; this baseline of carrier-layer thinness, friction gradient, and cohort hazard informs debates on social coding as a template for digital societies and on what AI-mediated OSS ecosystems should not optimise away.