LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

To address the lack of systematic, structured taxonomies for factive claims in social media, this paper proposes the first multi-granular taxonomy auto-generation framework specifically designed for factual statements. Methodologically, it integrates large language model (GPT-4)-driven prompt engineering, hierarchical topic modeling, and human-in-the-loop iterative refinement, alongside a novel taxonomy quality evaluation framework assessing structural coherence, semantic consistency, and coverage completeness. Experiments on three real-world social media datasets demonstrate that the framework significantly improves classification consistency—by up to 32%—and reveal dataset-dependent performance variations. Key contributions include: (1) establishing the first taxonomy generation paradigm explicitly oriented toward factual claims; (2) introducing an interpretable, multi-dimensional taxonomy quality assessment framework; and (3) enabling a high-quality, human–AI collaborative taxonomy construction pipeline.

Technology Category

Application Category

📝 Abstract

With the vast expansion of content on social media platforms, analyzing and comprehending online discourse has become increasingly complex. This paper introduces LLMTaxo, a novel framework leveraging large language models for the automated construction of taxonomy of factual claims from social media by generating topics from multi-level granularities. This approach aids stakeholders in more effectively navigating the social media landscapes. We implement this framework with different models across three distinct datasets and introduce specially designed taxonomy evaluation metrics for a comprehensive assessment. With the evaluations from both human evaluators and GPT-4, the results indicate that LLMTaxo effectively categorizes factual claims from social media, and reveals that certain models perform better on specific datasets.

Problem

Research questions and friction points this paper is trying to address.

Automates taxonomy construction for social media claims

Generates topics from multi-level granularities

Evaluates model performance on diverse datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large language models for taxonomy

Generates topics from multi-level granularities

Introduces specially designed evaluation metrics

🔎 Similar Papers

Claim Verification in the Age of Large Language Models: A Survey