From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing user profiling methods suffer from poor transferability, uninterpretable features, heavy reliance on large-scale labeled data, and inflexible pre-defined categories. To address these limitations, this paper proposes a two-stage large language model (LLM)-based user profiling framework. In the first stage, domain-defining statements serve as anchors for semi-supervised filtering to select high-confidence samples. In the second stage, the framework jointly generates abstract natural-language profiles and representative tweets, enabling cross-domain adaptability. The method integrates a domain knowledge base, LLM-driven abstraction generation, and extractive tweet selection. We further design a human-validated evaluation protocol to ensure reliability. Evaluated on a Persian political Twitter dataset, our framework outperforms the state-of-the-art by 9.8% in profiling accuracy, produces fully interpretable, linguistically coherent user profiles, and significantly enhances generalization performance on downstream tasks.

Technology Category

Application Category

📝 Abstract

Social media user profiling through content analysis is crucial for tasks like misinformation detection, engagement prediction, hate speech monitoring, and user behavior modeling. However, existing profiling techniques, including tweet summarization, attribute-based profiling, and latent representation learning, face significant limitations: they often lack transferability, produce non-interpretable features, require large labeled datasets, or rely on rigid predefined categories that limit adaptability. We introduce a novel large language model (LLM)-based approach that leverages domain-defining statements, which serve as key characteristics outlining the important pillars of a domain as foundations for profiling. Our two-stage method first employs semi-supervised filtering with a domain-specific knowledge base, then generates both abstractive (synthesized descriptions) and extractive (representative tweet selections) user profiles. By harnessing LLMs' inherent knowledge with minimal human validation, our approach is adaptable across domains while reducing the need for large labeled datasets. Our method generates interpretable natural language user profiles, condensing extensive user data into a scale that unlocks LLMs' reasoning and knowledge capabilities for downstream social network tasks. We contribute a Persian political Twitter (X) dataset and an LLM-based evaluation framework with human validation. Experimental results show our method significantly outperforms state-of-the-art LLM-based and traditional methods by 9.8%, demonstrating its effectiveness in creating flexible, adaptable, and interpretable user profiles.

Problem

Research questions and friction points this paper is trying to address.

Existing user profiling lacks transferability and interpretability

Current methods need large labeled datasets or rigid categories

LLM-based approach creates adaptable, interpretable profiles with minimal labels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs for domain-defining statement profiling

Combines abstractive and extractive profile generation

Reduces need for large labeled datasets

🔎 Similar Papers

A Large Language Model Guided Topic Refinement Mechanism for Short Text Modeling