🤖 AI Summary
To address the lack of systematic, structured taxonomies for factive claims in social media, this paper proposes the first multi-granular taxonomy auto-generation framework specifically designed for factual statements. Methodologically, it integrates large language model (GPT-4)-driven prompt engineering, hierarchical topic modeling, and human-in-the-loop iterative refinement, alongside a novel taxonomy quality evaluation framework assessing structural coherence, semantic consistency, and coverage completeness. Experiments on three real-world social media datasets demonstrate that the framework significantly improves classification consistency—by up to 32%—and reveal dataset-dependent performance variations. Key contributions include: (1) establishing the first taxonomy generation paradigm explicitly oriented toward factual claims; (2) introducing an interpretable, multi-dimensional taxonomy quality assessment framework; and (3) enabling a high-quality, human–AI collaborative taxonomy construction pipeline.
📝 Abstract
With the vast expansion of content on social media platforms, analyzing and comprehending online discourse has become increasingly complex. This paper introduces LLMTaxo, a novel framework leveraging large language models for the automated construction of taxonomy of factual claims from social media by generating topics from multi-level granularities. This approach aids stakeholders in more effectively navigating the social media landscapes. We implement this framework with different models across three distinct datasets and introduce specially designed taxonomy evaluation metrics for a comprehensive assessment. With the evaluations from both human evaluators and GPT-4, the results indicate that LLMTaxo effectively categorizes factual claims from social media, and reveals that certain models perform better on specific datasets.