Reranking-based Generation for Unbiased Perspective Summarization

šŸ“… 2025-06-19
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Evaluating and generating unbiased summaries—particularly for politically sensitive topics—remains challenging due to unreliable automatic metrics and weak methodological foundations for perspective-aware summarization. Method: We introduce the first human-annotated benchmark specifically designed for perspective-aware summarization. Leveraging this benchmark, we systematically validate that LM-as-a-judge metrics significantly outperform traditional metrics in assessing coverage and faithfulness (human correlation ≄ 0.87). We further propose a re-ranking–driven generation paradigm: it employs synthetic data augmentation and preference fine-tuning guided by re-ranking labels to overcome zero-shot limitations. Results: Experiments demonstrate consistent, statistically significant improvements across multiple dimensions—coverage, faithfulness, and stance balance—surpassing all baselines. This work establishes both an evaluation standard and a scalable, effective methodology for generating balanced, perspective-aware summaries with LLMs.

Technology Category

Application Category

šŸ“ Abstract
Generating unbiased summaries in real-world settings such as political perspective summarization remains a crucial application of Large Language Models (LLMs). Yet, existing evaluation frameworks rely on traditional metrics for measuring key attributes such as coverage and faithfulness without verifying their applicability, and efforts to develop improved summarizers are still nascent. We address these gaps by (1) identifying reliable metrics for measuring perspective summary quality, and (2) investigating the efficacy of LLM-based methods beyond zero-shot inference. Namely, we build a test set for benchmarking metric reliability using human annotations and show that traditional metrics underperform compared to language model-based metrics, which prove to be strong evaluators. Using these metrics, we show that reranking-based methods yield strong results, and preference tuning with synthetically generated and reranking-labeled data further boosts performance. Our findings aim to contribute to the reliable evaluation and development of perspective summarization methods.
Problem

Research questions and friction points this paper is trying to address.

Identifying reliable metrics for unbiased perspective summarization quality
Evaluating LLM-based methods beyond zero-shot inference for summarization
Improving perspective summarization using reranking and preference tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using reranking-based methods for summarization
Preference tuning with synthetic reranking-labeled data
Evaluating with reliable language model-based metrics
šŸ”Ž Similar Papers
No similar papers found.