Stance-Driven Multimodal Controlled Statement Generation: New Dataset and Task

๐Ÿ“… 2025-04-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the lack of high-quality datasets and effective modeling approaches for multimodal stance-controllable generation in political contexts. We formally define the novel task of multimodal stance-driven controllable generation. To this end, we introduce StanceGen2024โ€”the first multimodal stance generation dataset specifically curated from election-related political discourseโ€”and propose the Stance-Driven Multimodal Generation (SDMG) framework. SDMG integrates modality-weighted feature fusion, stance embedding guidance, cross-modal alignment modeling, and large language model (LLM)-based controllable text generation. Evaluated on StanceGen2024, SDMG achieves substantial improvements in stance accuracy (+12.3%) and semantic consistency (BLEU-4 +9.7%). Both the dataset and source code are publicly released to foster reproducible research.

Technology Category

Application Category

๐Ÿ“ Abstract
Formulating statements that support diverse or controversial stances on specific topics is vital for platforms that enable user expression, reshape political discourse, and drive social critique and information dissemination. With the rise of Large Language Models (LLMs), controllable text generation towards specific stances has become a promising research area with applications in shaping public opinion and commercial marketing. However, current datasets often focus solely on pure texts, lacking multimodal content and effective context, particularly in the context of stance detection. In this paper, we formally define and study the new problem of stance-driven controllable content generation for tweets with text and images, where given a multimodal post (text and image/video), a model generates a stance-controlled response. To this end, we create the Multimodal Stance Generation Dataset (StanceGen2024), the first resource explicitly designed for multimodal stance-controllable text generation in political discourse. It includes posts and user comments from the 2024 U.S. presidential election, featuring text, images, videos, and stance annotations to explore how multimodal political content shapes stance expression. Furthermore, we propose a Stance-Driven Multimodal Generation (SDMG) framework that integrates weighted fusion of multimodal features and stance guidance to improve semantic consistency and stance control. We release the dataset and code (https://anonymous.4open.science/r/StanceGen-BE9D) for public use and further research.
Problem

Research questions and friction points this paper is trying to address.

Generate stance-controlled tweets with text and images
Address lack of multimodal datasets for stance generation
Improve semantic consistency in stance-driven content creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal stance-controllable text generation
Weighted fusion of multimodal features
Stance-driven guidance for semantic consistency
๐Ÿ”Ž Similar Papers
No similar papers found.