Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the fragmentation and lack of standardized, computable formats in existing digital resources for Gregorian chant, which have hindered the application of computational methods. To overcome this limitation, the authors present CantusCorpus v1.0—a unified, standardized corpus comprising nearly 900,000 chants—by integrating heterogeneous sources such as the Cantus Index for the first time. They introduce a cross-repository alignment mechanism based on Cantus IDs to ensure interoperability across datasets. Additionally, they develop PyCantus, a lightweight, open-source Python library that facilitates data access, dynamic updates, and multi-source integration. By decoupling the data model from specific implementations, this work establishes a reproducible and extensible infrastructure for computational chant studies, significantly enhancing accessibility and interoperability in digital humanities research.

Technology Category

Application Category

📝 Abstract
Digital Gregorian chant scholarship has for decades enjoyed the privilege of a large digital resource cataloguing chant sources: the Cantus ecosystem, with nearly 900,000 chants catalogued across more than 2000 sources. The Cantus Database data model and the Cantus ID mechanism has been adopted by 18 more chant databases, jointly accessible through the Cantus Index interface. However, this data has only been available piecemeal via the individual online user interfaces; computational methods have so far had only a limited opportunity to process these immense resources. To overcome this hurdle, we compiled CantusCorpus v1.0, a dataset that combines everything that was available across the Cantus Index-centered network of databases as of mid-2025, and we have also provided the code for updating the dataset as the databases grow. We then created the lightweight PyCantus library for working with this data. PyCantus decouples the data model from the Cantus codebase and thus allows integration of further chant data sources, which we illustrate with harmonising pilot data from the Corpus Monodicum project. Computational chant research is attractive - and CantusCorpus v1.0 and PyCantus are infrastructures that should make work in this field more transparent, replicable, and accessible to digital humanities practitioners beyond chant scholars themselves.
Problem

Research questions and friction points this paper is trying to address.

Gregorian chant
digital scholarship
computational access
data integration
Cantus ecosystem
Innovation

Methods, ideas, or system contributions that make the work stand out.

CantusCorpus
PyCantus
Gregorian chant
digital humanities
computational musicology
🔎 Similar Papers
No similar papers found.
A
Anna Dvořáková
Charles University, Prague, Czech Republic
T
Tim Eipert
Julius-Maximilians-Universität Würzburg, Würzburg, Germany
D
Debra Lacoste
Dalhousie University, Halifax, Nova Scotia, Canada
Jan Hajič jr.
Jan Hajič jr.
Institute of Formal and Applied Linguistics, Charles University in Prague
Natural Language ProcessingArtificial IntelligenceStructural Bioinformatics