NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks

📅 2025-08-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the capability of multimodal autonomous agents in real-world news writing, specifically their ability to bridge information gaps and generate structured journalistic narratives. Method: We introduce the first multimodal agent benchmark tailored for news writing, requiring agents to autonomously perform web navigation, cross-source multimodal information retrieval, factual filtering, and narrative integration—unifying webpage-level multimodal exploration with narrative planning to emulate journalists’ active information-gap filling. Our approach integrates LLMs with state-of-the-art agent frameworks, leveraging keyword-based retrieval, historical context grounding, and multi-step reasoning for end-to-end news generation. Contribution/Results: Experiments reveal that while current agents excel at factual retrieval, they exhibit significant bottlenecks in task decomposition, long-horizon planning, and narrative coherence. The benchmark provides a quantifiable evaluation standard and concrete optimization directions for advancing multimodal autonomous agents.

Technology Category

Application Category

📝 Abstract
Recent advances in autonomous digital agents from industry (e.g., Manus AI and Gemini's research mode) highlight potential for structured tasks by autonomous decision-making and task decomposition; however, it remains unclear to what extent the agent-based systems can improve multimodal web data productivity. We study this in the realm of journalism, which requires iterative planning, interpretation, and contextual reasoning from multimodal raw contents to form a well structured news. We introduce NEWSAGENT, a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article by accessing core journalistic functions. Given a writing instruction and firsthand data as how a journalist initiates a news draft, agents are tasked to identify narrative perspectives, issue keyword-based queries, retrieve historical background, and generate complete articles. Unlike typical summarization or retrieval tasks, essential context is not directly available and must be actively discovered, reflecting the information gaps faced in real-world news writing. NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility. We evaluate open- and closed-sourced LLMs with commonly-used agentic frameworks on NEWSAGENT, which shows that agents are capable of retrieving relevant facts but struggling with planning and narrative integration. We believe that NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.
Problem

Research questions and friction points this paper is trying to address.

Evaluating agents' ability to automate news writing from multimodal web data
Assessing how agents retrieve, select, and edit information for journalism
Measuring agent performance in planning and narrative integration tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking multimodal agents for journalism tasks
Automated search, selection, and editing of raw contents
Evaluating agent capabilities in real-world news writing
🔎 Similar Papers
No similar papers found.