🤖 AI Summary
To address the limited expressivity of OWL—restricting formal ontology class definitions—this paper introduces monadic second-order logic (MSOL) to ontology classification for the first time. Leveraging MSOL’s high expressive power, we precisely formalize complex chemical structures (e.g., peptides) and automatically generate high-quality, interpretable training labels. We integrate this logical framework with a Transformer-based architecture to achieve end-to-end, scalable ontology-aware learning and molecular classification. Experiments on the ChEBI ontology and 119 million PubChem molecules demonstrate substantial improvements in classification accuracy and semantic coverage, particularly for long-range structural dependencies and rare subclasses. Our core contribution is a novel MSOL-driven, ontology-aware deep learning paradigm that bridges the gap between symbolic logic and sub-symbolic learning.
📝 Abstract
Despite its prevalence, in many domains, OWL is not expressive enough to define ontology classes. In this paper, we present an approach that allows to use monadic second-order formalisations for ontology classification. As a case study, we have applied our approach to 14 peptide-related classes from the chemistry ontology ChEBI. For these classes, a monadic second-order logic formalisation has been developed and applied both to ChEBI as well as to 119 million molecules from the chemistry database PubChem. While this logical approach alone is limited to classification for the specified classes (in our case, (sub)classes of peptides), transformer deep learning models scale classification to the whole of the ChEBI ontology. We show that when using the classifications obtained by the logical approach as training data, the performance of the deep learning models can be significantly enhanced.