π€ AI Summary
This work proposes an interactive ontology construction paradigm that bridges the gap between purely manual and fully automated approaches, which are often hindered by laborious processes or insufficient user control, respectively. By leveraging weighted self-organizing maps, the method enables progressive clustering of tabular data while integrating instance grouping with mechanisms for defining conceptual intensions. This approach empowers users to flexibly adjust both the number of clusters and their semantic interpretations, thereby preserving the efficiency of automation while significantly enhancing controllability. As a result, it facilitates interpretable clustering of entities and the generation of high-quality ontological classifications directly from tabular data.
π Abstract
Ontologies represent the conceptual knowledge of a domain. At the core of an ontology is the taxonomy of concepts and subconcepts that represent specific entities, which can be complex to build. In many cases, information is available in the form of records describing the characteristics of relevant entities, i.e., tabular data. Identifying patterns and similarities in such data can serve as a basis for identifying concepts and organizing them. However, doing so manually can be challenging, and purely automatic approaches, such as agglomerative clustering or relying on a large language model to analyze the data, can leave the user with overwhelming results and little control. In this paper, we describe a tool that enables the progressive and interactive construction of a taxonomy of concepts by identifying clusters as well as their intentional definitions. To do so, we rely on weighted self-organizing maps as a clustering method because they enable the creation of an arbitrary number of clusters that are distinct with respect to the distributions of values of specific characteristics of the clustered entities. We show that, by integrating this mechanism and others for rapidly creating concepts that group together instances from tabular data, this tool represents a middle ground between purely manual analysis and automatic methods for building ontological taxonomies.