Web-Browsing LLMs Can Access Social Media Profiles and Infer User Demographics

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This study investigates whether web-enabled large language models (LLMs) can infer sensitive demographic attributes—such as gender and political orientation—from publicly accessible social media profiles (e.g., X/Twitter) using only usernames, without API access or authentication. Method: Leveraging multi-step reasoning and real-time HTML parsing, the authors evaluate LLMs on a synthetic dataset of 48 accounts and validate findings against survey data from 1,384 international real users. Contribution/Results: The work provides the first empirical demonstration that LLMs can autonomously retrieve and interpret social profile content to perform demographic inference. Results reveal systematic biases—particularly for low-activity accounts—and demonstrate prediction accuracy sufficient to support applications in computational social science. Critically, it uncovers a novel privacy threat: inference of sensitive attributes without user consent, API keys, or login credentials. This establishes the first empirical baseline for LLM-powered social analysis and highlights risks of misuse in targeted advertising and information manipulation.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have traditionally relied on static training data, limiting their knowledge to fixed snapshots. Recent advancements, however, have equipped LLMs with web browsing capabilities, enabling real time information retrieval and multi step reasoning over live web content. While prior studies have demonstrated LLMs ability to access and analyze websites, their capacity to directly retrieve and analyze social media data remains unexplored. Here, we evaluate whether web browsing LLMs can infer demographic attributes of social media users given only their usernames. Using a synthetic dataset of 48 X (Twitter) accounts and a survey dataset of 1,384 international participants, we show that these models can access social media content and predict user demographics with reasonable accuracy. Analysis of the synthetic dataset further reveals how LLMs parse and interpret social media profiles, which may introduce gender and political biases against accounts with minimal activity. While this capability holds promise for computational social science in the post API era, it also raises risks of misuse particularly in information operations and targeted advertising underscoring the need for safeguards. We recommend that LLM providers restrict this capability in public facing applications, while preserving controlled access for verified research purposes.

Problem

Research questions and friction points this paper is trying to address.

LLMs can access social media profiles via usernames

LLMs predict user demographics with potential biases

Risks of misuse in advertising and information operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs browse web for real-time data retrieval

LLMs analyze social media to infer demographics

LLMs introduce biases in profile interpretation

🔎 Similar Papers

Evaluating LLM-based Personal Information Extraction and Countermeasures