🤖 AI Summary
This study addresses algorithmic and representational biases in large language models (LLMs) when automatically summarizing plenary debates of the European Parliament. We propose a multi-stage hierarchical summarization framework that systematically quantifies interference effects—such as speaker order and political affiliation—on viewpoint representation. By fusing proprietary and open-source LLM weights and integrating content fidelity optimization with structured abstraction strategies, our approach significantly improves factual consistency and coherence in summaries. Experimentally, we provide the first empirical evidence and quantification of pervasive positional bias (e.g., underrepresentation of opening/closing speakers) and partisan bias within parliamentary discourse. Our framework reduces representational inequality by up to 37% and advances the development of an ethics-oriented evaluation metric suite for LLMs in democratic contexts.
📝 Abstract
The automated summarisation of parliamentary debates using large language models (LLMs) offers a promising way to make complex legislative discourse more accessible to the public. However, such summaries must not only be accurate and concise but also equitably represent the views and contributions of all speakers. This paper explores the use of LLMs to summarise plenary debates from the European Parliament and investigates the algorithmic and representational biases that emerge in this context. We propose a structured, multi-stage summarisation framework that improves textual coherence and content fidelity, while enabling the systematic analysis of how speaker attributes -- such as speaking order or political affiliation -- influence the visibility and accuracy of their contributions in the final summaries. Through our experiments using both proprietary and open-weight LLMs, we find evidence of consistent positional and partisan biases, with certain speakers systematically under-represented or misattributed. Our analysis shows that these biases vary by model and summarisation strategy, with hierarchical approaches offering the greatest potential to reduce disparity. These findings underscore the need for domain-sensitive evaluation metrics and ethical oversight in the deployment of LLMs for democratic applications.