🤖 AI Summary
This study systematically investigates the emotional consistency and semantic coherence of LLM-generated text in social media contexts—specifically climate-related discourse on Twitter and Reddit—comparing Gemma and Llama models. We propose the first dual-dimensional “affective–semantic” evaluation framework tailored to social text, integrating VADER and Ekman-based emotion analysis, BERTScore for semantic similarity, and cross-platform statistical testing. Our findings reveal three previously undocumented phenomena: (1) pervasive attenuation of emotional intensity, (2) a systematic positive sentiment bias, and (3) model-specific affective modeling disparities—Llama yields more balanced emotion distributions, whereas Gemma amplifies anger while reinforcing optimism. Both models achieve high semantic coherence but exhibit weak task adaptability: emotional preservation and response quality in reply generation significantly underperform those in continuation tasks. These results establish a novel paradigm for trustworthy LLM evaluation and affective alignment in social media contexts.
📝 Abstract
Large Language Models (LLMs) demonstrate remarkable capabilities in text generation, yet their emotional consistency and semantic coherence in social media contexts remain insufficiently understood. This study investigates how LLMs handle emotional content and maintain semantic relationships through continuation and response tasks using two open-source models: Gemma and Llama. By analyzing climate change discussions from Twitter and Reddit, we examine emotional transitions, intensity patterns, and semantic similarity between human-authored and LLM-generated content. Our findings reveal that while both models maintain high semantic coherence, they exhibit distinct emotional patterns: Gemma shows a tendency toward negative emotion amplification, particularly anger, while maintaining certain positive emotions like optimism. Llama demonstrates superior emotional preservation across a broader spectrum of affects. Both models systematically generate responses with attenuated emotional intensity compared to human-authored content and show a bias toward positive emotions in response tasks. Additionally, both models maintain strong semantic similarity with original texts, though performance varies between continuation and response tasks. These findings provide insights into LLMs' emotional and semantic processing capabilities, with implications for their deployment in social media contexts and human-AI interaction design.