ConExion: Concept Extraction with Large Language Models

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This paper addresses the novel task of “universal concept extraction” for domain ontology construction—identifying *all* relevant concepts from documents (not merely salient phrases)—to support ontology coverage assessment and automated ontology learning. To this end, we propose the first unsupervised, large language model–based concept extraction framework, leveraging carefully engineered prompts to enable fine-grained, domain-adaptive concept discovery. Our approach requires neither annotated data nor domain-specific lexicons, ensuring strong generalizability and intrinsic interpretability. Evaluated on two established benchmarks, it achieves substantial F1-score improvements over prior state-of-the-art methods. To foster reproducibility and further research, we publicly release both the source code and benchmark datasets.

Technology Category

Application Category

📝 Abstract

In this paper, an approach for concept extraction from documents using pre-trained large language models (LLMs) is presented. Compared with conventional methods that extract keyphrases summarizing the important information discussed in a document, our approach tackles a more challenging task of extracting all present concepts related to the specific domain, not just the important ones. Through comprehensive evaluations of two widely used benchmark datasets, we demonstrate that our method improves the F1 score compared to state-of-the-art techniques. Additionally, we explore the potential of using prompts within these models for unsupervised concept extraction. The extracted concepts are intended to support domain coverage evaluation of ontologies and facilitate ontology learning, highlighting the effectiveness of LLMs in concept extraction tasks. Our source code and datasets are publicly available at https://github.com/ISE-FIZKarlsruhe/concept_extraction.

Problem

Research questions and friction points this paper is trying to address.

Extracting all domain-related concepts from documents

Improving F1 score over state-of-the-art techniques

Exploring unsupervised concept extraction using LLM prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained LLMs for concept extraction

Extracts all domain concepts, not just keyphrases

Explores prompts for unsupervised concept extraction

🔎 Similar Papers

No similar papers found.