LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of systematic, structured taxonomies for factive claims in social media, this paper proposes the first multi-granular taxonomy auto-generation framework specifically designed for factual statements. Methodologically, it integrates large language model (GPT-4)-driven prompt engineering, hierarchical topic modeling, and human-in-the-loop iterative refinement, alongside a novel taxonomy quality evaluation framework assessing structural coherence, semantic consistency, and coverage completeness. Experiments on three real-world social media datasets demonstrate that the framework significantly improves classification consistency—by up to 32%—and reveal dataset-dependent performance variations. Key contributions include: (1) establishing the first taxonomy generation paradigm explicitly oriented toward factual claims; (2) introducing an interpretable, multi-dimensional taxonomy quality assessment framework; and (3) enabling a high-quality, human–AI collaborative taxonomy construction pipeline.

Technology Category

Application Category

📝 Abstract
With the vast expansion of content on social media platforms, analyzing and comprehending online discourse has become increasingly complex. This paper introduces LLMTaxo, a novel framework leveraging large language models for the automated construction of taxonomy of factual claims from social media by generating topics from multi-level granularities. This approach aids stakeholders in more effectively navigating the social media landscapes. We implement this framework with different models across three distinct datasets and introduce specially designed taxonomy evaluation metrics for a comprehensive assessment. With the evaluations from both human evaluators and GPT-4, the results indicate that LLMTaxo effectively categorizes factual claims from social media, and reveals that certain models perform better on specific datasets.
Problem

Research questions and friction points this paper is trying to address.

Automates taxonomy construction for social media claims
Generates topics from multi-level granularities
Evaluates model performance on diverse datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large language models for taxonomy
Generates topics from multi-level granularities
Introduces specially designed evaluation metrics
🔎 Similar Papers
No similar papers found.