Use as Directed? A Comparison of Software Tools Intended to Check Rigor and Transparency of Published Work

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

The reproducibility crisis in scientific research stems partly from insufficient reporting transparency and inconsistent adherence to standardized practices. This study systematically evaluates 11 automated tools against ScreenIT’s nine rigor criteria—including open data availability, explicit inclusion/exclusion criteria disclosure, and preregistration—assessing their capacity to detect compliance. Results reveal that individual tools exhibit limited sensitivity, whereas ensemble approaches significantly improve overall detection rates; notably, one tool demonstrates superior performance specifically in identifying open data statements. This work presents the first cross-platform empirical validation of tool collaboration as an effective strategy for rigor assessment. Based on these findings, we propose concrete, evidence-based directions for tool development and refinement. All code and datasets are publicly released, providing a fully reproducible methodology and practical guidance to advance research transparency and reproducibility. (149 words)

Technology Category

Application Category

📝 Abstract

The causes of the reproducibility crisis include lack of standardization and transparency in scientific reporting. Checklists such as ARRIVE and CONSORT seek to improve transparency, but they are not always followed by authors and peer review often fails to identify missing items. To address these issues, there are several automated tools that have been designed to check different rigor criteria. We have conducted a broad comparison of 11 automated tools across 9 different rigor criteria from the ScreenIT group. We found some criteria, including detecting open data, where the combination of tools showed a clear winner, a tool which performed much better than other tools. In other cases, including detection of inclusion and exclusion criteria, the combination of tools exceeded the performance of any one tool. We also identified key areas where tool developers should focus their effort to make their tool maximally useful. We conclude with a set of insights and recommendations for stakeholders in the development of rigor and transparency detection tools. The code and data for the study is available at https://github.com/PeterEckmann1/tool-comparison.

Problem

Research questions and friction points this paper is trying to address.

Comparing automated tools for checking scientific rigor criteria

Evaluating tool performance in detecting transparency and open data

Identifying gaps and recommendations for improving rigor detection tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated tools check rigor criteria

Combined tools outperform single tools

Identified key areas for tool improvement

🔎 Similar Papers

StatWhy: Formal Verification Tool for Statistical Hypothesis Testing Programs

2024-05-25arXiv.orgCitations: 0