🤖 AI Summary
Kabaddi lacks standardized, reproducible data infrastructure, hindering scientific analysis and evidence-based decision-making. Method: We introduce KabaddiPy—the first open-source Python toolkit for kabaddi analytics—systematically scraping (via BeautifulSoup/Selenium), cleaning (with pandas), and structuring multidimensional data from professional leagues, notably India’s Pro Kabaddi League (PKL), including teams, players, and matches. It enables flexible querying by season, role (attacker/defender), and opponent. Contribution/Results: KabaddiPy establishes the first domain-specific standardized data schema and introduces player strategic profiling—e.g., cross-opponent optimal attack/defense strategies—filling a critical gap in sports analytics. The toolkit is publicly released and pre-populated with full-season PKL data encompassing 230 million viewers, enabling predictive modeling, tactical optimization, and causal inference. By providing scalable, structured data, KabaddiPy advances kabaddi analytics from experience-driven to data-driven practice.
📝 Abstract
Kabaddi, a contact team sport of Indian origin, has seen a dramatic rise in global popularity, highlighted by the upcoming Kabaddi World Cup in 2025 with over sixteen international teams participating, alongside flourishing national leagues such as the Indian Pro Kabaddi League (230 million viewers) and the British Kabaddi League. We present the first open-source Python module to make Kabaddi statistical data easily accessible from multiple scattered sources across the internet. The module was developed by systematically web-scraping and collecting team-wise, player-wise and match-by-match data. The data has been cleaned, organized, and categorized into team overviews and player metrics, each filterable by season. The players are classified as raiders and defenders, with their best strategies for attacking, counter-attacking, and defending against different teams highlighted. Our module enables continuous monitoring of exponentially growing data streams, aiding researchers to quickly start building upon the data to answer critical questions, such as the impact of player inclusion/exclusion on team performance, scoring patterns against specific teams, and break down opponent gameplay. The data generated from Kabaddi tournaments has been sparsely used, and coaches and players rely heavily on intuition to make decisions and craft strategies. Our module can be utilized to build predictive models, craft uniquely strategic gameplays to target opponents and identify hidden correlations in the data. This open source module has the potential to increase time-efficiency, encourage analytical studies of Kabaddi gameplay and player dynamics and foster reproducible research. The data and code are publicly available: https://github.com/kabaddiPy/kabaddiPy