🤖 AI Summary
This study addresses the absence of dedicated benchmarks for automatically generating policy and operational recommendations from institutional reports. It introduces, for the first time, the task of report-driven policy and operational recommendation generation, accompanied by the first high-quality dataset and a structured evaluation framework tailored to this objective. Distinct from conventional recommender systems, this work focuses on leveraging large language models (LLMs) to identify critical issues within textual reports and generate reflective, actionable suggestions. Experimental results demonstrate that state-of-the-art LLMs perform effectively on this task, and the proposed framework establishes a reliable benchmark and methodological foundation for future research in this emerging domain.
📝 Abstract
Large Language Models (LLMs) are extensively used in text generation tasks. These generative capabilities bring us to a point where LLMs could potentially provide useful insights in policy making or agency operations. In this paper, we introduce a new task consisting of generating recommendations which can be used to inform future actions and improvements of agencies work within private and public organisations. In particular, we present the first benchmark and coherent evaluation for developing recommendation systems to inform organisation policies. This task is clearly different from usual product or user recommendation systems, but rather aims at providing a basis to suggest policy improvements based on the conclusions drawn from reports. Our results demonstrate that state-of-the-art LLMs have the potential to emphasize and reflect on key issues and learning points within generated recommendations.