Modeling Human Responses to Multimodal AI Content

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

The proliferation of AI-generated content (AIGC) poses escalating risks of misinformation, yet existing research predominantly focuses on fact-checking while neglecting how AIGC shapes human perception and behavioral responses. Method: We propose a human-centered paradigm, introducing the MhAIM dataset—comprising 154,000 multimodal social media posts—and defining three novel metrics: trustworthiness, influence, and openness. We further develop HR-MCP, the first framework embedding human-response prediction into LLM interaction pipelines, and T-Lens, a system that dynamically aligns model outputs with human cognition via multimodal inconsistency detection. Both are built upon a standardized Multimodal Cognitive Protocol (MCP) for plug-and-play human-response prediction modules. Results: Experiments demonstrate that image-text consistency significantly improves human judgment accuracy. T-Lens achieves superior performance in user-response simulation and human-AI interpretability, offering a scalable, cognition-aligned pathway for AIGC governance.

Technology Category

Application Category

📝 Abstract

As AI-generated content becomes widespread, so does the risk of misinformation. While prior research has primarily focused on identifying whether content is authentic, much less is known about how such content influences human perception and behavior. In domains like trading or the stock market, predicting how people react (e.g., whether a news post will go viral), can be more critical than verifying its factual accuracy. To address this, we take a human-centered approach and introduce the MhAIM Dataset, which contains 154,552 online posts (111,153 of them AI-generated), enabling large-scale analysis of how people respond to AI-generated content. Our human study reveals that people are better at identifying AI content when posts include both text and visuals, particularly when inconsistencies exist between the two. We propose three new metrics: trustworthiness, impact, and openness, to quantify how users judge and engage with online content. We present T-Lens, an LLM-based agent system designed to answer user queries by incorporating predicted human responses to multimodal information. At its core is HR-MCP (Human Response Model Context Protocol), built on the standardized Model Context Protocol (MCP), enabling seamless integration with any LLM. This integration allows T-Lens to better align with human reactions, enhancing both interpretability and interaction capabilities. Our work provides empirical insights and practical tools to equip LLMs with human-awareness capabilities. By highlighting the complex interplay among AI, human cognition, and information reception, our findings suggest actionable strategies for mitigating the risks of AI-driven misinformation.

Problem

Research questions and friction points this paper is trying to address.

Understanding human perception of AI-generated multimodal content

Predicting human reactions to AI content in critical domains

Mitigating risks of AI-driven misinformation through human-awareness tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing MhAIM Dataset for human response analysis

Proposing trustworthiness, impact, openness metrics

Developing T-Lens with HR-MCP for human-awareness

🔎 Similar Papers

No similar papers found.