Assessing Capabilities of Large Language Models in Social Media Analytics: A Multi-task Quest

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of large language models (LLMs) in multitask social media analysis, particularly concerning authorship verification, content generation, and user attribute inference. The authors construct a unified evaluation framework to assess prominent models—including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT—on a Twitter dataset. They introduce a novel sampling strategy to mitigate “seen-data” bias and establish a reproducible benchmark by integrating attribute annotations based on IAB Tech Lab and U.S. Standard Occupational Classification (SOC) standards with real-user studies. Their findings reveal significant performance disparities across models in generating authentic content, generalizing verification tasks, and inferring user attributes, offering new insights for LLM-driven social media analysis.

Technology Category

Application Category

📝 Abstract

In this study, we present the first comprehensive evaluation of modern LLMs - including GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, and BERT - across three core social media analytics tasks on a Twitter (X) dataset: (I) Social Media Authorship Verification, (II) Social Media Post Generation, and (III) User Attribute Inference. For the authorship verification, we introduce a systematic sampling framework over diverse user and post selection strategies and evaluate generalization on newly collected tweets from January 2024 onward to mitigate "seen-data" bias. For post generation, we assess the ability of LLMs to produce authentic, user-like content using comprehensive evaluation metrics. Bridging Tasks I and II, we conduct a user study to measure real users' perceptions of LLM-generated posts conditioned on their own writing. For attribute inference, we annotate occupations and interests using two standardized taxonomies (IAB Tech Lab 2023 and 2018 U.S. SOC) and benchmark LLMs against existing baselines. Overall, our unified evaluation provides new insights and establishes reproducible benchmarks for LLM-driven social media analytics. The code and data are provided in the supplementary material and will also be made publicly available upon publication.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Social Media Analytics

Authorship Verification

Post Generation

User Attribute Inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Social Media Analytics

Authorship Verification