Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the low efficiency and poor scalability of manual concept definition extraction from academic literature in media bias research. To this end, we propose TaxoMatic—the first LLM-driven, end-to-end framework for domain-specific knowledge structuring focused on automatic definition extraction. TaxoMatic integrates a three-stage pipeline: data acquisition, relevance classification, and definition extraction, leveraging rule-augmented prompt engineering and fine-tuning of large language models (e.g., Claude-3-Sonnet) on human-annotated data. Evaluated on a corpus of 2,398 manually annotated scholarly articles, TaxoMatic significantly outperforms baseline methods in both relevance classification and definition extraction, demonstrating the feasibility and effectiveness of systematic LLM application to academic definition extraction. Its core contribution lies in establishing a domain-adapted definition extraction paradigm and providing a reproducible technical pathway for constructing high-quality domain-specific terminology knowledge bases.

Technology Category

Application Category

📝 Abstract
This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the frameworks effectiveness, with Claude-3-sonnet achieving the best results in both relevance classification and definition extraction. Future directions include expanding datasets and applying TaxoMatic to additional domains.
Problem

Research questions and friction points this paper is trying to address.

Automate definition extraction using large language models
Focus on media bias domain for conceptual definitions
Evaluate framework effectiveness on manually rated articles
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs for automated definition extraction
Integrates data collection and relevance classification
Demonstrates effectiveness with Claude-3-sonnet model
🔎 Similar Papers
No similar papers found.