Leveraging Large Language Models for Automated Definition Extraction with TaxoMatic A Case Study on Media Bias

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

This study addresses the low efficiency and poor scalability of manual concept definition extraction from academic literature in media bias research. To this end, we propose TaxoMatic—the first LLM-driven, end-to-end framework for domain-specific knowledge structuring focused on automatic definition extraction. TaxoMatic integrates a three-stage pipeline: data acquisition, relevance classification, and definition extraction, leveraging rule-augmented prompt engineering and fine-tuning of large language models (e.g., Claude-3-Sonnet) on human-annotated data. Evaluated on a corpus of 2,398 manually annotated scholarly articles, TaxoMatic significantly outperforms baseline methods in both relevance classification and definition extraction, demonstrating the feasibility and effectiveness of systematic LLM application to academic definition extraction. Its core contribution lies in establishing a domain-adapted definition extraction paradigm and providing a reproducible technical pathway for constructing high-quality domain-specific terminology knowledge bases.

Technology Category

Application Category

📝 Abstract

This paper introduces TaxoMatic, a framework that leverages large language models to automate definition extraction from academic literature. Focusing on the media bias domain, the framework encompasses data collection, LLM-based relevance classification, and extraction of conceptual definitions. Evaluated on a dataset of 2,398 manually rated articles, the study demonstrates the frameworks effectiveness, with Claude-3-sonnet achieving the best results in both relevance classification and definition extraction. Future directions include expanding datasets and applying TaxoMatic to additional domains.

Problem

Research questions and friction points this paper is trying to address.

Automate definition extraction using large language models

Focus on media bias domain for conceptual definitions

Evaluate framework effectiveness on manually rated articles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs for automated definition extraction

Integrates data collection and relevance classification

Demonstrates effectiveness with Claude-3-sonnet model

🔎 Similar Papers

LangBiTe: A Platform for Testing Bias in Large Language Models